Last month, Google announced that they’re now accepting mixed-media XML sitemaps: You can put images, video and regular page URLs all into the same map.
I saw this and rubbed my hands together while cackling maniacally. I could finally make sure that valuable images that clients paid for, or paid to have taken, would get indexed!
I won’t go into the details of Googles image, video and page sitemap spec. You can read it all in their post. But my fumbling about in code led to an interesting discovery: Google’s not all that picky.
I wrote my own little Python crawler over the weekend.
Yes, I know how pathetic that sounds.
Anyway, I wrote a Python crawler. It goes out to a site, grabs the URLs of all pages and all images, and puts ’em all into an XML sitemap. Neato! My first real use of Python. Alas, I did it horribly wrong, and generated a sitemap that munges images and page URLs together as if they were all the same. That is, I put both images and page URLs between <url> and </url> tags. Turns out, that’s wrong.
According to Google’s post, you have to use all sorts of fancy code to insert images and video into an otherwise normal XML sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.sitemaps.org/schemas/sitemap-image/1.1" xmlns:video="http://www.sitemaps.org/schemas/sitemap-video/1.1"> <url> <loc>http://www.example.com/foo.html</loc> <image:image> <image:loc>http://example.com/image.jpg</image:loc> </image:image> <video:video> <video:content_loc>http://www.example.com/videoABC.flv</video:content_loc> <video:title>Grilling tofu for summer</video:title> </video:video> </url> </urlset>
Eesh. It’s actually a good thing I didn’t see that before I started. I would’ve thrown my hands up and never learned Python.
But here’s the thing: I submitted the incorrectly-formatted sitemap before I knew I’d done it wrong. Once I saw the error of my ways, I got ready to watch Google wallop my company site, or at least ignore all of the images. But they didn’t. Instead, Google is indexing the images I included in the sitemap, even though the sitemap’s wrong.
Before I generated the sitemap, a site:portentinteractive.com search in Google images showed about 100 images. The day after, it showed 180 images. Today, it shows 193 images.
Clearly, Google’s tolerant of dorky semi-competent programmers like me. But the real question, and I’m honestly curious if anyone out there knows, is: Do we have to use the fancy formatting, or is packing images into URL elements going to work for the long term?
Comment below if you have a theory, or if you happen to be a Google engineer.