There’s a trend in the content world of using tools to help improve the value and quality of content (see Hemingway editor, Readability Test Tool, and Hubspot’s Blog Topic Generator). One category of tools is headline analyzers. They’re simple: put in your title, get a score, and tweak to improve the score. I use them sometimes when I’m feeling blah about a headline and want to kick around some new ideas on my own. I’ve been wondering if the recommendations that come out of these tools are worth following, though. So I decided to do a small study using Portent blog post titles as my subjects.
What Are Headline Analyzers?
Headline analyzers claim to help you create headlines that have more emotional appeal, which leads to more “traffic, shares, and search results” (according to CoSchedule). The two most established are AM Institute’s Emotional Marketing Value Headline Analyzer and CoSchedule’s Headline Analyzer. There’s a newer one from Sharethrough that I did not include in the study.
As the name of AM Institute’s tool suggests, the algorithms that drive these are based on lists emotional words. AM Institute’s analyzer uses their own software to compare the headline you input with an Emotional Marketing Value (EMV) word list. How do they decide which words go on this list? From their FAQ:
Our research started back the late 1960s and early 1970s. Dr. Hakim Chishti was a U.S. government research scholar, and was living in the Near East, studying the roots of several languages, Persian, Aramaic, Hebrew, Arabic, Urdu and several others.
His research found that there are these basic underlying harmonics, a tonality that flows through language, which are in many ways more profound and powerful than the dictionary meaning itself. Whereas sometimes meaning can be mistaken, the sound tones are always interpreted the same way by the emotions. These are better said as “emotional” reactions, although the effect is subtle.
CoSchedule’s Headline Analyzer is based on the EMV list, but adds other elements. As they said, “After we saw what EMV can do, we thought it would be helpful to build a new headline analyzer. This free tool combines EMV with several other elements we’ve found drive shares, traffic, and SEO results.” CoSchedule’s analyzer highlights word balance (of common, uncommon, emotional, and power words), headline type, length in characters and words, first and last three words (which stand out when skimming), and sentiment (positive, neutral, or negative). These five characteristics combine to provide an overall headline score.
CoSchedule gives a long list of power and emotional words. Here’s an snapshot:
As an example, if I used CoSchedule’s analyzer to inform the title for this post, I might end up with:
Are Headline Analyzers Worth Your Valuable Time?
Headline Analyzers, Tested: Are They Worth Your Time?
Those are both fine headlines. But are they improved enough to affect how many people click through to this post? Is it worth possibly contributing to an excess of strong-sentiment words in titles that will lead to reader overstimulation and fatigue?
Time to find out if relying on these words to jazz up your headlines will get you the traffic you want.
For this study, I used Portent blog posts that were all live for the last 6 months of 2016. The writers and editors of these posts did not use headline analyzers. I segmented organic traffic and referral traffic. I excluded email and social traffic because these audiences have already opted in to receiving information from us, which makes them more likely to click (I speculate) regardless of headline quality.
For organic and referral, I took the top 20 and bottom 20 of the top 100 posts. I cut it off at 100 because traffic drops off sharply after that. This gave me 80 posts total. I ran each title through the two headline analyzers and recorded the scores, then created scatterplots to look for any relationship between total sessions (the metric that indicated click-through rate, for my purposes) and analyzer score.
This gave me eight charts, one for each group of 20 correlated with scores from each headline analyzer. I calculated the Pearson’s r for each group. For those not familiar with statistics or research methods: A Pearson’s r is a correlation coefficient that indicates how strongly two variables are associated with each other and if changes in one (Y) can predict changes in the other (X). Y is the independent variable—headline analyzer scores—and X is the dependent variable—number of sessions. Pearson’s r ranges from -1 (a perfect negative correlation, which means that when Y increases, X will always decrease) to +1 (a perfect positive correlation, which means that when Y increases X will always increase). Either a negative or a positive indicates correlation, but the direction of change is either opposite or the same. The closer Pearson’s r is to 0, the weaker the correlation (less likely that a change in Y will result in a change in X).
For our purposes, a Pearson’s r of greater than either -0.4 or +0.4 is statistically significant, meaning some amount of variance in the number of sessions can be predicted by variance in analyzer scores. (Predicted, not explained. Correlation is not causation.) I chose that cutoff because it is widely used in the social sciences.
Headlines can make or break an article, blog post, or piece of evergreen content. If the headline doesn’t promise something interesting or somehow get readers’ attention, they won’t click through. Estimates of attention span of online readers range from 2 to 8 seconds—if you don’t get that attention right away, you don’t get it at all. So a tool that could reliably improve likelihood that readers will click headlines would be invaluable to content creators. Do we have such a tool yet?
Probably not. There was no statistically significant correlation between headline scores and total sessions for most groups. Here’s the data. 1
Top 20 Organic Posts
Here’s the graph of top 20 organic blog posts and their CoSchedule scores.
The Pearson’s r for this set is 0.396 (excluding the outlier). That’s just about a statistically significant number, so maybe there’s some effect here.
Here’s the graph for the same 20 posts and their AM Institute score.
Again excluding the outlier, the correlation coefficient is -0.128. Not much of a correlation, but significantly different from the first coefficient.
Now let’s look at the bottom 20 posts to see if poor traffic might be related to poor headline scores.
Bottom 20 Organic Posts
Here are the bottom 20 organic posts and their CoSchedule scores.
With a Pearson’s r of -0.156, there’s not much of a correlation here.
Bottom 20 organic posts and their AM Institute score.
These also had a weak negative correlation, with a Pearson’s r of -0.179.
Top 20 Referral Posts
Onto the referral blog posts. Referral traffic comes from links on sites we’re not affiliated with, so readers likely have no particular association with our brand or reason to expect links to our content on those sites.
Here are the top 20 referral blog posts and their CoSchedule scores.
There’s an outlier in this set, too. Excluding that, the Pearson’s r is 0.448—a statistically significant correlation.
And the top 20 referral blog posts with their AM Institute scores.
Again without the outlier, the Pearson’s r is negligible at 0.135.
I find the difference between CoSchedule and AM Institute data interesting. While CoSchedule built their tool on AM Institute’s research, they added a bunch of other factors. Maybe those factors are more important than emotional marketing value words.
Bottom 20 Referral Posts
Here are the bottom 20 referral posts with their CoSchedule scores.
These have a Pearson’s r of -0.106.
Here’s the final graph—bottom 20 referral posts with AM Institute scores.
These have another negative correlation coefficient, of -0.246.
Summary: To Analyze or Not?
There was a statistically significant positive correlation between sessions and CoSchedule scores for the top 20 organic and referral blog posts, but not between sessions and their AM Institute scores. However, in both cases the Pearson’s r was just barely at the statistically significant line, so it’s not a strong argument in favor of CoSchedule’s headline analyzer. It does indicate that CoSchedule’s analyzer may be more robust than AM Institute’s, however.
For the poorer performing blog posts in both traffic source datasets, there’s a slight negative correlation between score of blog post title and sessions, which is the opposite of what I would expect if these analyzers are to be believed. But the numbers aren’t statistically significant, so we can’t draw conclusions.
Overall, this study didn’t change my opinion of headline analyzers. My skepticism remains. I’ll probably keep using them when I want the aid of a tool to help me rethink headlines, ignore them when I don’t, and assume the scores they give me are just a vanity metric.
1. [It’s important to note that AM Institute and CoSchedule use different rating systems. AM Institute gives a percentage with this context: “For comparison, most professional copywriters’ headlines will have 30%-40% EMV Words in their headlines, while the most gifted copywriters will have 50%-75% EMV words in headlines.” A 35% from AM Institute is a good score. CoSchedule gives a number and a color indication: 54 and below is red, 55 to 69 is yellow, and 70 and above is green. A 35 from CoSchedule is abysmal. So there’s no straight comparison to be made between the two scores. We can, however, compare correlation coefficients to each other. ]↩