SearchFest 2016 (aka the last “SearchFest”) had a load of great talks, but only a few were as immediately relevant to my work as the deep dive discussion of structured and unstructured data from AJ Kohn and Mike Arnesen. They lent great insight to my approach to these two different types of information, and deepened my appreciation for structured data at large. This was also my first exposure to JSON-LD, the hot new thing in the space.
Fast forward a month or two and I’ve got a client with a problem that is best solved by better structuring their data, and for whom switching to JSON-LD made practical sense.
While in-depth discussion about JSON-LD was easy to come by, no single resource gave me enough footing to properly start to run with it.
This is my attempt to create the resource I wish I’d found when I went to apply the technology.
What is JSON-LD anyway?
JSON-LD (JavaScript Object Notation for Linked Data) is a method of implementing structured data markup on a website. It’s been in use for a few years, and is used and supported by Google, Bing, Yandex, and many smaller search engines.
JSON-LD utilizes the widely known JSON notation. This syntax is considerably simpler and more widely known than either microdata or RDFa, which also means that it is easier to implement and less prone to human errors.
Like Microdata and RDFa, JSON-LD can exist in the body of the page, but can also be used in the head. It also allows for multiple blocks of script, which can be useful for breaking it into more manageable chunks.
In this post we’ll break down the basics of the code and its implementation, touch on some general tips for utilizing and validating the code, and wrap with a pair of examples for further study.
Implementing JSON-LD:
The following code is an example of JSON-LD as it would appear on the page. We’ll walk through the various elements of the code and cover the essentials of each:
<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "WebPage", "breadcrumb": "Homepage > Category > Great Hypothetical Widget", "mainEntity": { "@type": "Product", "image":"http://www.example.com/image-of-the-great-widget.jpg", "name": "My Great Hypothetical Widget", "description": "The description of the great hypothetical widget and an overview of its greatness." } } </script>
The Breakdown:
Calling the script:
<script type="application/ld+json"> { … mumble mumble code stuff … }
This code is the script that contains the JSON-LD within the HTML of the rendered page. If your JSON-LD isn’t in those curly brackets, it isn’t being parsed by the search engines or applied to the page.
Context:
"@context": "http://schema.org",
@context defines the vocabulary that the data is being linked to.
In this example our @context references all of schema.org. This allows us to use any of the Types or Properties it defines and is more than enough to cover most structured data use cases.
A more advanced execution of @context can use specific URLs to define terms manually like Properties. These terms can then be defined as @types later in the script, allowing a greater degree of specificity or allowing us to call alternative vocabularies:
"@context": { "Store": "http://example.com/store", "Product": "http://example.com/product", }
This kind of granularity isn’t generally needed. If you aren’t sure how specifically it would be useful to you, you’re likely better off sticking with schema.org.
Type:
"@type": "WebPage",
We use @type to call the Type of entity you are describing. For those with prior experience using microdata, this is our “itemtype.” In the example, we first use @type to describe the webpage itself, and then call it again to define the “mainEntity” of the page.
Defining Properties:
"breadcrumb": "Homepage > Category > Great Hypothetical Widget",
There is no equivalent to itemprop in JSON-LD. Once we reference a Type, any Property that can apply to that Type can be called and defined. In this case, we first reference the webpage Type so that we can define a breadcrumb for search engines to use in their results.
Pro Tip: After every property but the last, you must include a comma. This tells the engines parsing your code that there is more to come. If the last property has a comma however, this will return an error.
Courtesy of Aaron Bradley: “breadcrumb” is the only data type that Google uses for rich snippets that doesn’t work with JSON-LD. Microdata or RDFa is needed, as well as BreadCrumbList and some Properties for that Type.
Nested Entities:
"mainEntity": { "@type": "Product", "image":"http://www.example.com/image-of-the-great-widget.jpg", "name": "My Great Hypothetical Widget", "description": "The description of the great hypothetical widget and an overview of its greatness." }
JSON-LD handles nested entities simply. To define a Property with another Type we open a new set of curly brackets and define the properties of our new entity. Upon closing the curly bracket, we are back to defining the properties of the parent entity.
If you call other Properties after your nested entity then be sure to include the comma after closing the bracket to avoid parsing errors.
This structure can go as deep as needed, but should be kept legible to the people who will need to maintain it going forward. Even if that’s future-you.
Structured Data Markup Tips
There are a few things you should be careful of when using structured data, particularly if using the JSON-LD method.
- Pay attention to required Properties: Make sure the essential Properties are in place for the Type you reference (e.g. events require a date and time).
- Don’t use special characters or quotes: Especially if using JSON-LD, quotes and many special characters will prevent Google from parsing the data.
- Develop familiarity with your vocabulary: Schema.org is the most common vocabulary used, and it’s robust. Knowing the Types available, their defining Properties, and their interactions, is essential to getting the most out of structured data on your site.
- Mind your cases: All schema.org Types and Properties are case sensitive, so make sure to double check your capitalization. This is particularly true when using JSON-LD, as all terms are explicitly case sensitive.
- Do not misrepresent the information on the page: Google delayed supporting JSON-LD initially due to the fact that users never see the code or its results – opening the door to cloaking issues. While they do support it now, they are vigilant about making sure it accurately represents the information on the page.
Testing and Validation
While you’re setting up your markup, you can use Google’s Markup Testing Tool to see how it will be parsed or to identify any issues with the code. You can use this to fetch a live page or just copy and paste the HTML or JSON-LD right into the tool and Google will allow you to explore the structure of your data. You can see every entity on the page as well as the Properties defined for them. This is a huge help for debugging any errors.
Running the example code from this post returns:
A quick scan of the results shows that google is picking up what I’m putting down.
Pro Tip: clicking on the results in the tool will highlight the property you selected in the code.
Finding the code in the wild:
The best way to learn about a technique like this is to study how it’s being used by people who know more than you.
The easiest way I’ve found to do that is to run other sites through the markup tester and play with the code until you’re comfortable with the vocabulary they’re using, and why.
Examples of JSON-LD:
http://www.wikia.com/fandom calls three JSON-LD scripts to describe different entities on the page. This completely avoids nested entities and keeps the script broken into manageable, modular chunks.
On the opposite end of the spectrum, http://oscar.go.com/ uses multiple nested entities in their code. This creates more relationships between the entities on the page, but results in a more elaborate block of script to maintain.
Regardless of whether entities are isolated or nested in the same script, the tools will let you inspect the structure of the data separately from the code conveying it. In fact, they will present the underlying structure of the data no matter the code used, allowing you to learn structure from any implementation – including microdata or RDFa.
Both of the sites above are rich with examples of structured data at work, and both use JSON-LD as their method of delivery. I highly recommend looking at multiple types of page from each.
That’s a wrap:
Special thanks to AJ and Mike for a great talk, and Manu Sporny and Aaron Bradley for their resources on the topic, without which a lot of this learning would not have happened.
Hopefully this has helped you find your footing or deepened your understanding of JSON-LD and structured data at large. I look forward to fielding any questions in the comments.
Our latest guide on Rich Results is live, learn how you can use your newly found JSON-LD knowledge.
Jared, thanks for this fine walk-through and your kind words. JSON-LD FTW! 🙂
However, a couple of points that might be helpful for you and your readers.
In regard to the second example in the “Context” section note that, while this is syntactically correct (the invalid comma after the second property/value pair excepted), this code does not in itself declare a type, but only two terms. Specifying these as types would require an @type declaration referencing the terms that have been specified:
“@type”: [ “Store”, “Product” ]
(Though this example is a rather quixotic, and certainly not very typical, way of referencing other vocabularies, especially as both terms reference the same IRI. For what it’s worth, I think this section runs the risk of being confusing rather than edifying for most marketers, which is a shame because the rest of the article is pretty helpful.)
It’s also worth noting that in your first example under nested entities you declare the type “PRODUCT”. While the Google Structured Data Testing Tool may not complain about this, that’s only because it’s being forgiving, as all schema.org types and properties are case sensitive. Remember that with JSON-LD “a context is used to map terms to IRIs” (and terms themselves are, in JSON-LD, explicitly case sensitive), so when you declare “PRODUCT” you’re actually declaring the IRI “http://schema.org/PRODUCT” which, as you’ll see if you paste that into your address bar, returns a 404.
Finally I’ll note that the choice of “breadcrumb” for a JSON-LD example may be a little unfortunate, as it’s literally the only public data type in Google’s structured data repertoire that, in order to produce a rich result in Google search results, still requires the use of microdata or RDFa explicitly (and, in any case, requires the use of BreadCrumbList and some properties for this type). 🙂
But, to be clear, the markup is sound and you describe the basics of JSON-LD well – thanks! Hope you take my comments in this context, as I’ve only provided them because – as is so often is the case with markup – the devil is in the details.
Aaron,
Thank you for the constructive criticism!
I saw the error in the URL’s for the @context section earlier (though still post-publishing) and adjusted one of them to correct it. Thanks for spotting the PRODUCT bit as well, not sure how that snuck in there but gonna kill it with fire asap.
Your point about @context not declaring @types but only making terms available for reference is much appreciated, and I’ll find a way to clarify that in the copy.
Case sensitivity was actually something I didn’t find in my initial research, or at least missed noticing enough for it to make the cut here. That is about to be added to the general tips section.
I’ll have to add a note under the properties section to call out the breadcrumb nuance. Choosing breadcrumb was in fact fortunate: I would not have learned that point until I couldn’t understand why my breadcrumbs weren’t working, and since it’s the exception I can use it to contrast the general rule.
Thank you very much for taking the time to go into the details here (and elsewhere).
Thanks Jared, and keep up the good work!
Hi Jared,
You know how microdata only allows you to tag text in the code and thereby limiting you to those data. But JSON-LD lets you add all the information you can even if they are not in the page. May I know your thoughts on that?
Allan
While I have an understanding of microdata, I have not worked directly enough with it to have a good sense of it’s actual limitations. With that grain of salt in hand:
If we look at structured data as building our own mini-knowledge graph, then some advantages come to mind when looking at this aspect. In microdata I can only work with what is present on the page, so I need to find a way to get the details of every itemprop to make sense for a user to consume. With JSON-LD, I can reference an entity on the page, but add much more depth for the search engines about what that entity is without needing to contort that information into fitting on the page.
In the talk that started this whole adventure, Mike Arnesen spoke a bit about the ability to reference named entities from elsewhere in the site, and even mentioned that it was possible between domains. While I haven’t had the chance to get into this advanced use case yet, I don’t think that microdata can get out of the immediate page enough for that.
Separately, I think there is something to be said for having an independent, self contained data layer. Easier to read, easier to debug, easier to create and manage.
Many thanks for breaking down JSON-LD and the tools. I have had a hard time finding the info needed to implement it.
We’ve started to use a plug-in – this might be too rudimentary for your needs but has been helpful for our sites –
https://wordpress.org/plugins/wp-seo-structured-data-schema/
A question for you – small business owners who work from home often refuse to publish their street address. We wondered if using the JSON-LD mark up to include a street address that’s not actually on the website would be acceptable. The intent is not to deceive, but to provide legitimate location info that the client doesn’t want published out of privacy concerns. Is this too risky?
If an address is sensitive, it should not be included in structed data markup, regardless of whether it’s visible on the site.
One of the uses of structured data is to pre-digest information for google’s knowledge graph. Even if there is never any issue with cloaking, once google knows your address, it’s public. A search for a business will often bring up a knowledge card about that business, and one of the fields is an address.
I would advise that companies in this situation take out a PO box and put it on the site and in the markup as a mailing address.