Developers don’t do SEO. They make sure sites are SEO-ready.
That means developers hold the key to SEO. It’s true. If you’re a developer and you’re reading this, laugh maniacally. You’re in control.
You control three things: viability, visibility, and site flexibility.
This post provides guidelines for all three.
- Get Canonicalization Right
- Pay Attention To Performance
- Engineer Away ‘Thin’ Content
- Use Standard Page Structure
- Put Videos On Their Own Pages
- Generate Readable URLs
- Use Subfolders, Not Subdomains
- Don’t Use Nofollow
- Make Navigation Clickable
- Link All Content
- Don’t Hide Content (If You Want To Rank For It)
What’s A Developer?
This isn’t a navel-gazing philosophical question.
For this article’s purposes, a developer connects site to database (or whatever passes for a database, don’t get all anal-retentive on me), builds pages using the design provided, and does all the work those two jobs require.
A developer does not design. They do not write content. If you do all three jobs, tell the designer/content parts of your brain to take a break. This post isn’t for them.
Viability: Stuff you do on the server and in early software configuration that readies a site for ongoing SEO.
Mostly I chose this word because the other two ended with “ility,” and it just works.
Generate And Store HTTP Server Logs
Server logs are an SEO source of truth. Log file analysis can reveal all manner crawler hijinx.
Every web server on the planet has some kind of HTTP log file.
And now someone’s going to tweet me their platform that, in defiance of all logic, doesn’t generate log files. OK, fine.
99% of web servers on the planet have some kind of log file.
Happy? Great. Now go make sure your server generates and saves HTTP logs.
Most servers are set up correctly out of the box, but just in case, make sure log files include:
- The referring domain and URL, date, time, response code, user agent, requested resource, file size, and request type
- IP address helps, too
- Relevant errors
Also make sure that:
- The server doesn’t delete log files. At some point, someone’s going to need to do a year-over-year analysis. If your server wipes log files every 72 hours or similar silliness, they can’t do that. Archive logs instead. If they’re gigantic, make the SEO team pay for an Amazon Glacier account
- The logs are easily retrieved. If you don’t want your SEOs mucking around the server, I understand. But make it easy for you and the rest of the development team to retrieve HTTP logs. It’ll save you time later, and ensure your replacement can find them after you win the lottery
Log files, folks. Love ’em. Keep ’em. Share ’em.
Don’t “Turn On” Analytics. Configure It.
Why does everyone treat analytics like a light switch? Paste the script, walk away, boom, you’ve got data.
- Track onsite search. People use that little magnifying glass buried in your site navigation. Your SEO (and UX) teams can learn a lot by reviewing onsite query data. Store it now, avoid apologizing later
- Track across domains and subdomains. If your company operates multiple domains or splits content across subdomains, creepily stalk users across all of those properties. Your SEO team can then see how organic traffic flows from site to site
- Filter by IP. Exclude users from your company, from competitors, or from your pesky neighbor who keeps asking you for a job. One IP filter your SEO will appreciate: users in your office. Set it up, and they’ll buy you the beverage of your choice, except Southern Comfort, which gave me the worst hangover of my life and is banned from our entire industry, forever
- Track on-page events. If your Analytics team is ready for you, put the “hooks” in place now, saving everyone precious time later
Is this all SEO stuff? Not exactly. But it all helps the SEO team. Is this your job? Maybe not. But you’re on the Dev team. You know you’re the top of the escalation tree for everything from analytics data to printer malfunctions. When they can’t find the data they need, the SEO team will end up at your door.
Even if you do, keep in mind:
- Robots.txt tells bots not to crawl a URL or page. The page might remain in the search index if it was previously crawled (at least, in my experience)
- Robots.txt noindex probably won’t work much longer
- The meta robots tag tells bots not to index a page, and/or not follow links from that page. The bot has to crawl the page to find the tag
- When you launch the site remember to remove the robots disallow/and noindex meta tags please gods please I beg you
Set The Correct Response Codes
Use the right response codes:
200: Everything’s OK, and the resource exists
301: The resource you requested is gone forever. Poof. Look at this other one instead
302: The resource you requested is gone, but it might be back. Look at this other one for now
40x: The resource you requested can’t be found. Oops
50x: Gaaaahhhhh help gremlins are tearing my insides out in a very not-cute way. Nothing’s working. Everything’s hosed. We’re doomed. Check back later just in case
Some servers use 200 or 30x responses for missing resources. This makes Sir Tim Berners-Lee cry. It also makes me cry, but I don’t matter. Change it.
Even worse, some CMSes and carts come configured to deliver a 200 response for broken links and missing resources. The visiting web browser tries to load a missing page. Instead of a 404 response, the server delivers a 200 ‘OK’ response and keeps you on that page.
That page then displays a ‘page not found’ message. Crawlers then index every instance of that message, creating massive duplication. Which becomes a canonicalization issue (see below) but starts as a response code problem.
Yes, Google says they’ll eventually figure out whether you meant to use a 302 or a 301. Keyword: eventually. Never wait for Google. Do it right in the first place.
I make no judgments regarding the pluses or minuses of these. But plan ahead and configure them before you launch:
- rel canonical
Other Random Things
Check ’em off now, so you don’t have to deal with them later:
- Put your site on a server with solid-state drives (SSDs). Read/write is a lot faster. Argue if you want, but a faster server means a faster site, which makes ranking easier. More about this when I get to Performance
- Virtual servers. Call me old-fashioned, but putting my site on a server with 900 others gives me hives. I’m not worried about shared IPs or search reputation. I’m worried about what happens when some bozo creates an endless loop and crashes my site
Viability: It’s Like Good Cholesterol
I just found out that I have high cholesterol, which is irritating because I eat carefully and bike 50–100 miles/week. But whatever.
MY POINT HERE is that server viability fights potential blockages by making sure your SEO team can get straight too…
This is a horrible analogy. Moving on.
This is what everyone thinks about: How you build a site impacts search engines’ ability to find, crawl, and index content. Visibility is all about the software. How you build the site impacts it.
Get Canonicalization Right
Every resource on your site should have a single valid address. One. Address. Every page, every image.
Canonicalization problems can cause duplicate content that, in turn, wastes crawl budget, reduces authority, and hurts relevance. Don’t take my word for it. Read Google’s recommendation. If you follow these recommendations, you’ll avoid 90% of canonicalization problems:
Home Page Has a Single URL
If your domain is www.foo.com, then your home page should “live” at www.foo.com.
It shouldn’t be
or anything else. Those are all canonically different from www.foo.com. Make sure all links back to the home page are canonically correct.
Don’t depend on rel=canonical or 301 redirects for this. Make sure all internal site links point to the same canonical home page address. No site should ever require a 301 redirect from internal links to its own home page.
Pagination Has One Start Page
Make sure that the link to page one of a pagination tunnel always links to the untagged URL. For example: If you have paginated content that starts at /tag/foo.html, make sure that clicking ‘1’ in the pagination links takes me back to /tag/foo.html, not /tag/foo.html?page=1.
No Hard-Coded Relative Links
Friends don’t let friends create links like this:
Those can create infinitely-expanding URLs:
/en-us/ /en-US/US-Distribution /en-US/~/link.aspx?_id=6F0F84644AC94212ACA891D5AE1868C9&_z=z /en-US/~/~/link.aspx?_id=B682300BEAD24C0ABC268DB377B1D5A0&_z=z /en-US/~/~/~/link.aspx?_id=6F0F84644AC94212ACA891D5AE1868C9&_z=z /en-US/~/~/~/~/link.aspx?_id=B682300BEAD24C0ABC268DB377B1D5A0&_z=z /en-US/~/~/~/~/~/link.aspx?_id=6F0F84644AC94212ACA891D5AE1868C9&_z=z /en-US/~/~/~/~/~/~/link.aspx?_id=B682300BEAD24C0ABC268DB377B1D5A0&_z=z /en-US/~/~/~/~/~/~/~/link.aspx?_id=6F0F84644AC94212ACA891D5AE1868C9&_z=z /en-US/~/~/~/~/~/~/~/~/link.aspx?_id=B682300BEAD24C0ABC268DB377B1D5A0&_z=z
Never hard-code relative links, unless you want to be the comic relief in an SEO presentation.
No Query Attributes For Analytics
Don’t use query attributes to tag and track navigation. Say you have three diﬀerent links to /foo.html. You want to track which links get clicked. It’s tempting to add
?loc=value to each link. Then you can look for that attribute in your analytics reports and figure out which links get clicked most.
You don’t need to do that. Instead, use a tool like Hotjar. It records where people click, then generates scroll, click and heat maps of your page.
If you absolutely must use tags, then use /# instead of ? and change your analytics software to interpret that, so that
/#loc=value. Web crawlers ignore everything after the hash sign.
Things to Do
Whether you have canonicalization issues or not, make sure you:
- Set the preferred domain in Google Search Console and Bing Webmaster Tools (last time I checked, you could do this in both)
- Set rel=canonical for all pages. Might as well handle it ahead of time
- Set the canonical HTTP header link
It’s best to fix canonicalization issues by doing it right: build your site to have a single address for every page.
If you can’t do that, though, use these:
- rel=canonical points search engines at the preferred page. It doesn’t fix crawl budget issues, but it’s something. Make sure you use it right! Incorrect rel=canonical setups can hurt more than help
- Use the URL Parameters Tool in Google Search Console to filter out parameters that cause duplication. Be careful. This tool is fraught with peril
Get Canonicalization Right From The Start
Please don’t do these things:
- Use robots.txt or meta robots to hide duplicate content. This completely screws up the site’s link structure, doesn’t hide the content, and costs you authority
- Point rel=canonical for one set of duplicates at diﬀerent target pages
- Use either Google Search Console or Bing Webmaster Tools to remove the URLs of duplicate pages
In other words, no funny business. Do it right from the start.
Pay Attention To Performance
Performance is done to death, so I’m going to keep it short. First, a brief sermon: page speed is an easy upgrade that gets you multiple wins. Faster load time means higher rankings, sure. It also means higher conversion rates and better UX.
Lighthouse isn’t perfect, but it’s a helpful optimization checklist. It also tests accessibility for a nice 2-in–1.
Do all the stuff.
Regardless of the test results:
- Use HTTP/2 if you’re allowed. It has all sorts of performance benefits
- Use hosted libraries. You don’t have to use Google’s, but here they are
- Unless you look at code coverage, in which case I suggest you trim the heck out of your included files and work from there
- Compress images. Teach your team to use squoosh. They’ll remember to use it for about a day. After that, either flog them regularly or use something like Gulp to automatically compress before upload
You can also consider installing page speed modules. I’d never do this. I don’t want Google software running directly on my server. But they do a lot of work for you. You decide.
A few other quick tips:
Chances are, someone else will add a bunch of third-party scripts and clobber site performance. You can get oﬀ to a good start:
- Defer loading of third-party scripts, where you can
- Ask the service provider for the compressed version of the script. They often have one
- Use CDN versions wherever possible. For example, you can use the Google CDN version of jquery
Use DNS Prefetch
If you’re loading assets from a separate site, consider using DNS prefetch. That handles the DNS lookup ahead of time:
<link rel="dns-prefetch" href="//foo.com" /> That reduces DNS lookup time. More on that:
Find the most popular resources on your site and use prefetch (not to be confused with DNS prefetch, above). That loads the asset when the browser is idle, reducing load time later:
<link rel="prefetch" href="fonts.woff" /> Be careful with prefetch. Too much will slow down the client. Pick the most-accessed pages and other resources and prefetch those.
Engineer Away ‘Thin’ Content
Build your site to avoid ‘thin’ content: pages with very little content and little unique information.
Avoid these things. Don’t laugh. I still find this kind of stuff in audits all the time:
- Send-to-a-friend links with unique query attributes
- Member pages with blank bios and/or no other useful content
- Blank or low-value “more reviews” pages. Some sites have links to separate review pages for each product. That’s helpful, unless there are no reviews, or the text for most reviews is terribly helpful like “great product”
- Empty, paginated photo galleries. I honestly don’t know how sites manage this, but they do
- Tag pages for tags with a single piece of content
Don’t wait for an SEO to make you go back and fix it. Build to prevent this kind of stuff:
- If you must have send-to-a-friend links, use fragments plus
window.locationor something similar. Crawlers will ignore everything after the hash
- Require a minimum length bio, or hide member profiles with short or nonexistent bios
- Don’t display separate review pages unless you have a minimum number of reviews
- Don’t generate or link to tag pages unless the tags have more than N pieces of content. You can choose “N.” Just please make sure it’s not “1”
- Use rel=canonical for multiple SKUs, request forms or anything else that might end up generating thin content. This is not a fix. It’s a lousy workaround. But it’s better than nothing, and it’ll catch stuff you miss
Use Standard Page Structure
We’ve already dealt with title elements and such, so this is a lot easier. Every page should:
Have a Single H1
While heading tags don’t necessarily aﬀect rankings, page structure as evidenced by rendering does. H1 is the easiest way to represent the top level in the page hierarchy.
Have a single H1 that automatically uses the page headline, whether that’s a product description, an article title, or some other unique page heading. Do not put the logo, images or content that repeats from page to page in an H1 element.
Make H2, H3, H4 Available to the content creators
Allow multiple H2, H3, and H4 elements on the page. Let content creators use H2, H3, and H4. You can let them drill down even further, but I’ve found that leads to some, er, creative page structures.
Use <p> Elements for Paragraph Content, Not Hard Breaks or DIVs
Any developer knows this. Content creators sometimes don’t. I still see many writers insert double line breaks. It’s not easy, but if you can somehow enforce the use of <p> elements for paragraphs, it will make later tweaks to styles a lot easier.
Use Relevant Structured Data
At a minimum, generate structured markup for:
See schema.org for more information. Right now, JSON-LD is the most popular way to add structured data. It’s easiest, and if you (properly) use a tag manager, you can add structured data to the page without changing code.
Oh, Come On Ian
I can hear you. No need to mutter. You’re saying, “None of this impacts rankings.”
It may. It may not. But using standard page structure improves consistency across the site for every content manager and designer who will work on it. That leads to good habits that make for a better site. It leads to less hacky HTML code pasted into the WordPress editor. That means a more consistent user experience. Which is good for rankings.
Put Videos On Their Own Pages
Video libraries are great, but having all of your videos on a single page makes search engines cry. Put each video on its own page. Include a description and, if you can, a transcript. Link to each video from the library. That gives search engines something to rank.
Generate Readable URLs
Where possible, create URLs that make sense. /products/shoes/running is better than /products?blah–1231323
Readable URLs may not directly impact rankings. But they improve clickthrough because people are more likely to click on readable URLs.
Also, Google bolds keywords in URLs.
Finally, what are you more likely to link to?
Use Subfolders, Not Subdomains
Yeah, yeah, go ahead and hurl insults. I’ve heard it all before. If you want to argue about it, go read this post first.
All quality content should ‘live’ on the same domain. Use subfolders. The blog should live at /blog. The store should live at /store or similar. I always get pushback on this one. Google has said in the past that subdomains are OK. Yes, they’re OK. They’re not the best. Google says subdomains are sometimes just as good. Not always.
When Googlebot comes across a subdomain, it decides whether to treat it as a subfolder or not. Like many things Google does and says, they’re unclear about it and results diﬀer. I have no test data. I can say this: in most cases, moving content to a subfolder helps, if by ‘most’ we mean ‘every site I’ve ever worked on.’
So why leave it to chance? Use a subfolder now, and you won’t have to deal with subdomains and unhappy marketers later.
There are two exceptions to the rule:
- If you’re doing reputation management, you need to control as many listings on the first page of a search result as possible. Google often separately ranks subdomain content. A subdomain can help you eat up an additional spot
- If you’re having trouble with a large amount of low-quality content or thin content, move that to a subdomain, and you may see rankings improvements
The most common reason folks use subdomains is the blog: The CMS, or server, or something else doesn’t support a blog. So you set up a WordPress.com site.
That ends up being blog.something.com. If you have to do that, consider using a reverse proxy to put it all under one domain. Of course, if you have no choice, use a subdomain. It’s better than nothing.
Don’t Use Nofollow
Just don’t. Nofollow is meant to prevent penalties for links from comments and advertising. It doesn’t help channel PageRank around a site. It does burn PageRank. It’s a bad idea.
The only time to use nofollow is to avoid a penalty because you’re linking to another site via ads or other paid space on your site. A good rule of thumb: If you’re doing something ‘just’ for SEO, think carefully. Nofollow is a good example.
Make Navigation Clickable
Clicking the top-level navigation should take me somewhere other than ‘/#.’.
Top-level nav that expands subnav but isn’t clickable creates three problems:
- The site’s primary navigation is a hidden rollover. Google and Bing will attribute less importance to it
- You lose what could be a top-level link to a single page on your site from every other page on your site. That’s scads of internal authority gone to waste
- Users will click on ‘Dropdown’ and get frustrated
Make sure clicking any visible navigation takes me somewhere.
Link All Content
Don’t Hide Content (If You Want To Rank for It)
Until, oh, last week (seriously, Google just changed this last week), Google said they wouldn’t consider content that only appeared after user interaction. Content behind tabs, loaded via AJAX when the user clicks, etc. got zero attention.
Last week, the big G said they do examine this content, and they do consider it when determining relevance. I believe them, but as always, they’ve left out some details:
- Do they assign the same weight to content that requires user interaction?
- Do they diﬀerentiate between hidden content (like tabs) and content that doesn’t load without user interaction (like asynchronous content)?
Oh, also: The old tiny-content-at-the-bottom-of-the-page trick still doesn’t work. That’s not what they meant.
Ask Yourself Why
Only Hide Content When Essential
This is the easy part: if you’ve got content on the page for which you want to rank, don’t hide it behind a tab, an accordion, or whatever else. On a well-designed page, people who want to see everything will scroll down. If they don’t want to see it, they weren’t going to click the tab anyway.
Don’t Deliver Content Based on User Events
If you want content indexed, don’t deliver it based on a user event. Yes, Google says they now index content that reveals after user interaction. Play it safe, though, if you can.
Show Content Before the Load Event
Look at your site’s HAR. Anything that appears after the ‘load’ event is probably not going to get indexed: the Load event, in an HAR
Make sure whatever you want indexed appears before then.
Use Indexable URLs
See Make Content Clickable. URLs with /#! and similar won’t get crawled. Google deprecated that as an indexing method.
No one thinks about this. No. One. SEO requires non-stop tweaks and changes by content managers, analysts, designers, and lots of other non-developers. If they can’t do the work, they bury the resource-strapped development team in requests.
SEO grinds to a halt, and organic performance falls.
I mean, if you have infinite dev resources no worries. Skip the rest of this article. Go back to feeding your pet rainbow-crapping unicorn.
Otherwise, keep reading this relatively brief section.
Have One, Editable Title Tag on Each Page
The title element is a strong on-page organic ranking signal.
- There must be one <title></title> element on each page
- It must be a separate, editable field. Have the title element default to the page headline, but make it separately editable
- As I write this, the ideal title tag is 60-ish characters in length, but don’t set a limit. It changes all the time. Your users should be using the Portent SERP Preview Tool because it’s the best thing since Nestle KitKats. Right? Right???
Make Meta Tags Editable in the CMS
First: the meta keywords tag is utterly useless and has been since, oh, 2004. Remove it. If your SEO protests, find a new SEO. With that out of the way, make sure each page has the following editable META tags:
Every page should have an editable description meta tag. The description tag doesn’t affect rankings. It does, however, aﬀect clickthrough rate, which can mean organic traﬃc growth even if rankings don’t improve. Like the title tag, make the description tag a separate, editable field.
If the page is a product page, have the description tag default to the short product description. If the page is a longer descriptive page, have the description tag default to the first 150 characters of the page content. Never have a blank meta description! If you do, Google and Bing will choose what they think is best. Don’t rely on them.
Open Graph Protocol (OGP)
Facebook uses OGP tags to build the text, image, and title of shared content. Without it, Facebook may use the title and meta description tag and pick an image. It may pick something else. OGP tags let the content creator control what will appear on Facebook and, like the meta description tag, they can boost clickthrough.
Have the OGP tags default to the page’s title, meta description and featured image. Then let the author edit them. At a minimum, include og:title, og:type, og:image and og:url. You can read more about OGP tags at http://ogp.me/.
Twitter Card Markup
Twitter cards are more niche. Twitter will use OGP tags as a fallback, so these aren’t required. If you can add them, though, it gives content creators even more control over what Twitter shows for shared content.
Twitter cards can double clickthrough and other engagement. They’re worth the eﬀort. See https://dev.twitter.com/cards/overview for more information.
Make Image ALT Attributes Editable in the CMS
The ALT attribute is another strong ranking signal. Every image uploaded as part of page content must be editable when the user uploads it. If they do not enter an ALT attribute, default to:
- “Image:” + product name, if this is a product page
- “Image:” + image caption, if entered
- “Image” + file name
I recommend including “Image:” so that screen readers and other assistive devices identify the snippet of code as an ALT attribute.
Keep Your CSS Clean
Overuse of classes can create headaches. Use semantic CSS wherever possible: Instead of using “.h2” for example, use “h2” . (lousy punctuation to make sure the CSS is clear).
This tip stolen shamelessly from Martijn Oud.
Last updated 2019. Things change. Check back for new stuff.