Ian Lurie // Jul 20 2010
Last month, Google announced that they’re now accepting mixed-media XML sitemaps: You can put images, video and regular page URLs all into the same map.
I saw this and rubbed my hands together while cackling maniacally. I could finally make sure that valuable images that clients paid for, or paid to have taken, would get indexed!
I won’t go into the details of Googles image, video and page sitemap spec. You can read it all in their post. But my fumbling about in code led to an interesting discovery: Google’s not all that picky.
I wrote my own little Python crawler over the weekend.
Yes, I know how pathetic that sounds.
Anyway, I wrote a Python crawler. It goes out to a site, grabs the URLs of all pages and all images, and puts ‘em all into an XML sitemap. Neato! My first real use of Python. Alas, I did it horribly wrong, and generated a sitemap that munges images and page URLs together as if they were all the same. That is, I put both images and page URLs between <url> and </url> tags. Turns out, that’s wrong.
According to Google’s post, you have to use all sorts of fancy code to insert images and video into an otherwise normal XML sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.sitemaps.org/schemas/sitemap-image/1.1" xmlns:video="http://www.sitemaps.org/schemas/sitemap-video/1.1"> <url> <loc>http://www.example.com/foo.html</loc> <image:image> <image:loc>http://example.com/image.jpg</image:loc> </image:image> <video:video> <video:content_loc>http://www.example.com/videoABC.flv</video:content_loc> <video:title>Grilling tofu for summer</video:title> </video:video> </url> </urlset>
Eesh. It’s actually a good thing I didn’t see that before I started. I would’ve thrown my hands up and never learned Python.
But here’s the thing: I submitted the incorrectly-formatted sitemap before I knew I’d done it wrong. Once I saw the error of my ways, I got ready to watch Google wallop my company site, or at least ignore all of the images. But they didn’t. Instead, Google is indexing the images I included in the sitemap, even though the sitemap’s wrong.
Before I generated the sitemap, a site:portentinteractive.com search in Google images showed about 100 images. The day after, it showed 180 images. Today, it shows 193 images.
Clearly, Google’s tolerant of dorky semi-competent programmers like me. But the real question, and I’m honestly curious if anyone out there knows, is: Do we have to use the fancy formatting, or is packing images into URL elements going to work for the long term?
Comment below if you have a theory, or if you happen to be a Google engineer.
Ian Lurie is founder and CEO of Portent Inc., an internet marketing agency that has provided internet marketing, including PPC, SEO, social and analytics services, since 1995. Read More