Several schools of copy research practice have emerged, each with its own strengths and weaknesses, and each with its own advocates and detractors. We’ve outlined the most common types of measures below. They include quantitative measures of recognition, recall, persuasion, liking, neuro-physiological response, and behavior, and qualitative techniques.
Most commercial copy testing systems include at least one of these measures, but can vary significantly in which measures they focus on and the protocols through which the measures are obtained.
It is difficult to pinpoint with certainty who first pioneered the recognition method in advertising research. It is believed that one of the earliest reports of its use was by Dr. Walter Dill Scott of Northwestern University in 1908. Daniel Starch claims to have devised the “recognition method for measuring readership of advertisements” in 1919, though the fact may not have been widely known until the publication of his article in a Harvard Business Review in 1923.
The recognition method received a significant boost in the latter years of the 1920s when George Gallup used it to study newspaper readership and was able to measure the strength and importance of pictures and captions, leading to increased use of comic sections and front page photos by major papers. And in the early 1930s, Gallup’s work received national attention when he applied the recognition method of reading and noting to magazines.
Recognition techniques also served as the foundation for the Starch Rating System that involved noting and readership of print advertising.
Critics have pointed to the fact that, in the recognition method, there was little control between when a respondent had been exposed to the ad and when he or she was interviewed, or even if the respondent had read the test issue or had seen the ad elsewhere. It is also considered difficult to verify this measure of effectiveness, as all the respondent need to do is claim to have seen the ad.
Recognition continues to be used today, particularly as a measure of exposure and particularly by the magazine publishing industry.
Recall measurements emerged in the 1940s, when the methodological issues with recognition methods became apparent. The G&R recall measurements required proof that advertising impressions had been made and that those impressions were linked to the brand name. In addition, recall offered substantial face validity; it is logical to assume that once the message had been retained in some form, it will likely contribute to purchase behavior in the future. Recent work in neuro-physiology has shown that memory and saliency are important components to predicting consumer purchase behavior.
As recall grew to be the dominant measure of commercial effectiveness, questions also arose about it reliability and validity. Since then, much has been done to improve measure administration and a number of proprietary and third-party studies have demonstrated its validity.
Recall continues to be a workhorse measure for many advertising testing systems. It is currently viewed as a necessary but not solely sufficient measure of advertising effectiveness.
The major advance in persuasion finds its origins in the early 1940s with the work of Horace Schewerin, Paul Lazarsfeld of the Austrian school and Frank Stanton, former president of CBS. The attempt was to measure people’s reactions to radio programs while they were actually listening to the program. Lazarsfeld had brought a device (subsequently known as the Lazarsfeld-Stanton Program Analyzer) from Vienna which traced with a pen on graph paper an individual’s response to the program material. It was Schwerin’s exposure to this work that directed his interest in measuring persuasion. Later, Research Systems Corporation (ARS) bought Schwerin’s company as well as the rights to the pre-post shift approach to measuring persuasion, which he developed and Procter & Gamble adopted as the new standard for measuring advertising effectiveness.
Advocates of persuasion argued that it was a better predictor of actual purchase behavior than recall. They also argued that persuasion studies, which were carried out in a theater setting, obtained measures from a larger sample base than did recall, thereby facilitating statistically sound sub-sample breakouts, and were conducted under more controlled exposure conditions, thus producing better reliability.
Critics of the persuasion approach point to the fact that it is subject to a multiplicity of influences, such as the type of product being advertised and whether the respondent is familiar with the brand or even with the product category. Some also argue that persuasion is a substantially more complex dynamic than just a change in purchase intent. Additionally, there is the risk that latent learning during pre-testing might taint post-test responses, thereby impacting the persuasiveness of the test stimuli. Finally, there are questions about its value for non-packaged goods categories.
Current practice has turned from using one single measure, viz., recall or persuasion, to how both measures work together to improve overall understanding of commercial effect. There has also been some movement within the industry away from measuring persuasion via a pre-post, same person design, to measuring it in a post-only or test-control matched sample design.
There has been controversy, too, around the value of commercial likability. While there is no doubt that likable ads entertain viewers, there are questions about their ability to persuade. Some, like Rosser Reeves, openly disclaimed any connection between likability and sales effectiveness. As head of the Ted Bates agency, Reeves developed the “Pain, Pain, Pain” commercials for Anancin, which are considered to be among the most annoying, but ultimately effective ads.
Liking received support in 1985, when the Ogilvy Center for Research and Development conducted a large-scale study to investigate whether liking a commercial had anything to do with persuading consumers to buy the advertised brand. The study found that people who like the commercial a lot were twice more likely to be persuaded by it than people who simply felt neutral towards the advertising. Biel & Bridgwater (1990) suggested that people like commercials that they feel are relevant and worth remembering, and that these two elements contribute to increased persuasion.
The ARF Copy Validation Study (1990) provided additional support for liking, which it found having the strongest correlation to sales of any measure the ARF tested.
Most, however, like Rossiter and Percy (1987), have argued that whether or not liking is important depends on the position of the product “in a complex metrics of consumer needs, motives for purchase and financial or psychosocial risks.”
Today, liking is considered a relevant contributor to understanding advertising effectiveness, particularly for some product categories and market situations, although generally less important than recall and persuasion. Researchers often use diagnostic copy testing tools to understanding the aspects of the ad that are liked (or disliked) and improve the advertisement accordingly.
As early as the 1930s, work was being done to apply electrodermal or Galvanic Skin Response (GSR) methods to the study of advertising (Lucas and Britt, 1950). Other physiological measures of arousal, such as blood pressure and heart rate, were also included in the portfolio of commercial testing. Eye-tracking, too, has had periods of interest, but it was not until the 1990s that the promise of neurophysiology started to become more broadly accepted. Up to that point, when physiological measures detected emotional arousal, they were followed by questions probing whether such arousal was triggered by positive or negative feelings. Newer technologies are enabling more precise measurements and increasing emphasis on measuring implicit emotions to gauge audience response to advertisements. These include a slew of physiological techniques such as brain scanning, neurological analysis, electromyography, etc. This is a developing area of ad research measurement, with better systems for measuring arousal, activation, and valence being introduced and a better understanding of the value of those measures in the context of advertising response.
The holy grail in advertising measurement is behavior, particularly purchase behavior. Advertising would be a less risky business if companies could know with reasonable confidence the payoff from an advertising investment. The convergence of digital technologies in media, retailing and computation are making such calculations more accessible and practical. It is increasing routine to know whether a person who has been served an ad has taken action on it by visiting a website and even purchasing a product. Analytic frameworks are becoming more sophisticated with Randomized Control Trials (RCTs) being adopted from medical research. RCTs involves randomly assigning participants to either a test or control cell to ensure probabilistic equivalence between users in the two groups so that the differences which are detected are likely to be the result of the test condition rather than sampling inconsistencies.
However, some psychologists and medical researchers question the reliance of such institutions as the National Institutes of Health and the Food and Drug Administration on RCTs as the gold standard for clinical research. RCTs can be problematic in situations when human opinion and emotion play a role and unknown confounds can create invisible bias. RCTs also present practical limitations in the number of variables that can be tested and say little about the consequences of the advertising on the vast majority of an audience who neither purchased or clicked.
In addition to the previously discussed quantitative schools of techniques, qualitative techniques like Focus Groups and In-Depth Interviews (IDIs) are used extensively in advertising research to explore the rationale behind the views and opinions of potential audiences. The basic assumption behind qualitative research is that people tend to give more insightful responses in a relaxed, non-threatening environment.
The qualitative research methodology enables one to more fully understand why people respond as they do.
In spite of its apparent attractiveness, qualitative research suffers from several limitations when it comes to assessing the potential effectiveness of concepts and creative. As Cooper (2001) pointed out, “qualitative research cannot tell you conclusively whether or not a piece of advertising will break through the advertising clutter; whether it communicates as much when seen in the context of a night’s viewing as it does on its own in a research environment; whether it will ‘perform’ as well as other executions or other campaigns at other times.”
Others have pointed to potential problems concerning the contaminating effects of artifacts, e.g., irrelevant person-related factors that may not really measure the effectiveness of the ad. The subjective and difficult-to-interpret nature of the data, has led to the current use of qualitative techniques more at the copy development stage. Current practice is that qualitative research can provide new insights and interpretive guidance, particularly for copy development, but is not reliable enough for evaluative copy testing.
Perhaps the best way to think of these various systems is that each can be instructive for understanding advertising quality depending on one’s objectives. As stated by A. E. Kadzin (2010) in the context of causal measurement in clinical research, “The method that you use to study something can influence the results you get. Because of that, you always want to use as many different methods as you can.”
- Biel, A.L. & Bridgwater, C. A. (1990). Attributes of Likable Television Commercials. Journal of Advertising Research, 30(3), 38-44.
- Cooper, Alan, ed. (2001). How to Plan Advertising, 2nd edition. London: Thomson Learning.
- Haley, R.I. & Baldinger, A.L. (1991). The ARF Copy Research Validity Project, Journal of Advertising Research, Vol 31(2), 11-32.
- Kazdin, A.E. (2010). Single-Case Research Designs: Methods for Clinical and Applied Settings, 2nd edition. New York: Oxford University Press.
- Rossiter, J.R. & Percy, L. (1987). Advertising and Promotion Management, Boston, MA: McGraw-Hill.
- Starch, D. (1923). Testing the Effectiveness of Advertising. Harvard Business Review, 1(4), 64-474.
- Williams, B.A. (2010). Perils of evidence-based medicine. Perspectives on Biology and Medicine, 53, 1, 106–120.