Let me just start out with the ending. The Google Doc below takes a list of pages containing embedded YouTube or Vimeo videos and uses magic to grab all the data you need for a video XML sitemap. Then it makes the sitemap for you.
You can have it by going to the link below and selecting “Make a copy” under “File.” The instructions are right in the document.
If you want to know more about how it works, keep reading. But if, not – thanks for stopping by and hopefully this makes your life a bit easier!
Before I go into the details, I do recommend taking the time to learn more about video sitemaps. Rather than repeat what others have already said, I’ll just tell you to start with the horse’s mouth then read other articles like this great post by Phil Nottingham or this one by Justin Hammack.
Second, I should say that, even though this tool does generate a fully compliant video sitemap you can give straight to Google, I don’t actually recommend doing that. This isn’t the only video sitemap generator out there, but I wanted something that was a bit more straightforward and would allow me to easily customize the various fields.
So I made this to specifically help with some of the more time-consuming tasks but still let me optimize the data for each video. I’m still working on the tool that does my entire job for me.
Using the Google Doc
You’ll notice several tabs on the spreadsheet, but basically the doc does two things. First it sucks up all the meta data from the actual video pages on YouTube or Vimeo. Second, it organizes all that data so you can customize it, then marks that in XML sitemap format.
Here’s the play by play:
1. Get your list of landing pages. Start with a list of URLs you know have either YouTube videos or Vimeo videos embedded on them. If you have a large site, you can use Screaming Frog or another crawler to scan your sites for pages that have the embed code.
YouTube has had a few types of embed codes over the years, so I recommend searching for code that contains either “youtube.com/v/” or “youtube.com/embed/”. For Vimeo, the links are going to contain either “player.vimeo.com/video” or “video.com/moogaloop.swf?clip_id” so that’s the text you want to look for.
2. Go to the YouTube or Vimeo extractor sheet. Pages with YouTube videos go on the “YouTube Embed Extractor” sheet. Pages with Vimeos go to the “Vimeo Embed Extractor.”
You’ll notice that there’s also another two tabs in case you already have a list of video pages and would rather just pull the data directly from there. That’s fine, but remember that you need the URL from your own site to make a video sitemap.
3. Paste the URLs into the sheet. I know I say it’ll handle 15 URLs in the doc, but I really wouldn’t recommend doing more than 10 URLs at a time. The doc is pulling HTML from both the landing page and video page. This means that when you run 5 URLs, the doc is storing all of the HTML from 10 pages.
4. Drag down the formula rows. There’s a few hidden cells so be sure to include the orange cell on the right when you’re dragging down the formulas.
5. Copy all the cells in the green zone (including the original landing page) and paste values only into the “Generator” sheet. Seriously, remember to do “values only” otherwise your browser might kill itself.
5.1 Rinse and repeat. Keep adding to the “Generator” sheet until you’ve finished all of the pages. Do some Vimeos then some YouTubes if you want. If there are other video types, you can just type those into the “Generator” sheet, too. That’s still easier then marking up a text file manually.
6. SEO your data. If you think you’re going to get rich snippets by copying the titles and descriptions directly from YouTube and Vimeo, good luck. Plus, chances are some of the titles and descriptions suck anyway, so just rewrite them. Also, I have columns for some of the more recommended optional tags like “video:category” and “video:uploader.” Add those where you can or leave them blank if you have nothing to put there.
(If you’re using the “Publication Date” you need to remember to put a single quote (‘) before the date, otherwise the Doc will convert it to MM/DD/YYYY format, which isn’t what the XML Sitemap uses. Unfortunately, you can’t just switch the column to “text” either, since that kills the IF(ISBLANK) condition in the next sheet that prevents empty <video:publication_date> tags from appearing when you leave those cells blank.
7. Go to the “Video Sitemap” tab and drag down that red cell. If your “Generator” sheet goes down to row 56, drag the red cell in “Video Sitemap” down to 56. It’s got a bunch of IF conditions in there, so if you left some of the optional tags blank, they won’t show at all.
8. Close the <urlset>. Just write </urlset> below the last row in Column A.
9. Export as .txt and rename. Boom. Video Sitemap.
Customizing the document/how it works
The document uses the ImportData function to pull in the HTML from the landing pages. Then it uses a combination of RegexExtract and RegexReplace to grab the video source from the embed code, concatenate that as an actual URL, import that HTML, and grab all the Schema data. Vimeo doesn’t use the “name” or “description” itemprops in Schema, so the doc just grabs the Open Graph title and descriptions instead.
To get even nerdier for a minute, Schema uses ISO format for duration (like PT1H23M45S for 1 hour, 23 minutes and 45 seconds), but the XML sitemaps need everything in seconds, so the doc even takes that ugly duration format and converts it into seconds for you.
The doc does a lot, but not everything. At least not out of the box. If you want, though, you can edit it to do more. Let’s say you want to pull the Open Graph data instead of Schema from YouTube. You’ll just need to go to the cells that are grabbing the Schema data.
Then rewrite the function to look for the og:title element instead.
If you want to look for another video type, that’s when you’d go in the hidden cells. Here you’ll see where the imported HTML is being stored. You can also see this messy expression that is looking for YouTube embed codes.
If you wanted to look for Viddler videos, for example, you’d want to edit this to extract something like “viddler.com.embed.([0-9]+)” here. Then you’d reconstruct the URL in J3 so that K3 can import the data. Since Viddler uses Open Graph, instead of Schema on the embed pages, you’d follow the steps described earlier to extract the OG data.
(Note: I have not actually tried the steps described above to extract Viddler video data; I’m just outlining the process I would hypothetically follow if I were to do this. You’ll have to do some testing yourself.)
The doc I created already has fields to manually add “category”, “tags”, “uploader”, and “upload date”. But If you wanted to add <video:expiration_date>, for example, you’d simply create another column in the “Generator” sheet.
Then you’d go to the “Video Sitemap” sheet and take a look at the formula in the red cell. The formula is long, but you can see it’s just a concatenation of all the cells in the generator sheet.
Somewhere before the </video:video> tag you’ll simply add “<video:expiration_date>”,Generator!N3,”</video:expiration_date>” being sure to use commas to separate the different tags.
If only some of the videos in the sitemap have an expiration date, then you’ll need to wrap that in an IF(ISBLANK) condition. The end result would look something like this:
The char(10) isn’t really necessary, but it makes the cell easier to read in the doc by adding a line break. Unfortunately the line break doesn’t transfer when the document is exported as a .txt, so each <url> element will be one huge line when you view it.
Tips and Common Errors
Remember, folks, this is a Google Doc. Which means it totally works almost all of the time. But sometimes it breaks and sometimes it really, really breaks.
Sometimes you’ll see “<!DOCTYPE…” show up in the Schema/OG cells. This is because when RegexReplace can’t find a match, it just imports everything from the cell it’s referencing. So in this doc, that’s all of the source code from the page. Sometimes this might even blow the sheet’s margins all over the place.
The most common culprit is a video that’s been either removed or marked as private. In that case it’s a great opportunity to do a quick audit of videos on your site.
The second most common cause is something along the lines of “WTF IS WRONG WITH YOU I HATE COMPUTERS?!” because, like I said, it’s a Google Doc.
Also, be sure to delete rows that aren’t actively pulling data. Remember to follow the same rules in Step 4, and include the orange cell when you’re deleting the rows, because the hidden columns are where most of the data is stored.
Google Docs only lets you Import 50 URLs at a time, so, again, delete rows after you’ve copy/pasted them and be mindful of your space.
Questions? Suggestions? Tips? Write them in the comments below. I’ll be sure to keep this post updated as I make any changes/improvements to the Doc so check back if you’re wondering.
(I will also likely mention them on Twitter or Google+ but I would never do any shameless self-promotion like that on here.)
Feature requests will be dependent on how easy they are and what I happen to be doing at that moment.