Google Plays Nice: XML sitemaps, images, and a mystery
Ian Lurie Jul 20 2010
Last month, Google announced that they’re now accepting mixed-media XML sitemaps: You can put images, video and regular page URLs all into the same map.
I saw this and rubbed my hands together while cackling maniacally. I could finally make sure that valuable images that clients paid for, or paid to have taken, would get indexed!
I won’t go into the details of Googles image, video and page sitemap spec. You can read it all in their post. But my fumbling about in code led to an interesting discovery: Google’s not all that picky.
I wrote my own little Python crawler over the weekend.
Yes, I know how pathetic that sounds.
Anyway, I wrote a Python crawler. It goes out to a site, grabs the URLs of all pages and all images, and puts ‘em all into an XML sitemap. Neato! My first real use of Python. Alas, I did it horribly wrong, and generated a sitemap that munges images and page URLs together as if they were all the same. That is, I put both images and page URLs between <url> and </url> tags. Turns out, that’s wrong.
According to Google’s post, you have to use all sorts of fancy code to insert images and video into an otherwise normal XML sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.sitemaps.org/schemas/sitemap-image/1.1" xmlns:video="http://www.sitemaps.org/schemas/sitemap-video/1.1"> <url> <loc>http://www.example.com/foo.html</loc> <image:image> <image:loc>http://example.com/image.jpg</image:loc> </image:image> <video:video> <video:content_loc>http://www.example.com/videoABC.flv</video:content_loc> <video:title>Grilling tofu for summer</video:title> </video:video> </url> </urlset>
Eesh. It’s actually a good thing I didn’t see that before I started. I would’ve thrown my hands up and never learned Python.
But here’s the thing: I submitted the incorrectly-formatted sitemap before I knew I’d done it wrong. Once I saw the error of my ways, I got ready to watch Google wallop my company site, or at least ignore all of the images. But they didn’t. Instead, Google is indexing the images I included in the sitemap, even though the sitemap’s wrong.
Before I generated the sitemap, a site:portentinteractive.com search in Google images showed about 100 images. The day after, it showed 180 images. Today, it shows 193 images.
Clearly, Google’s tolerant of dorky semi-competent programmers like me. But the real question, and I’m honestly curious if anyone out there knows, is: Do we have to use the fancy formatting, or is packing images into URL elements going to work for the long term?
Comment below if you have a theory, or if you happen to be a Google engineer.
Related, recent, and whatnot
Chairman & Principal Consultant
Ian Lurie is Chairman and Principal Consultant of Portent Inc., an Internet marketing agency that has provided Internet marketing, including PPC, SEO, social and analytics services, since 1995. Read More
- How to use Intention.js for Responsive Design
- What Does a Degree in Architecture Have to Do with Web Design & Development?
- A Front-End Workflow For The Evolving Web
- Stop Writing Blog Posts: Ideas for Interactive Content
- Parallax Scrolling? HTML5 Animations? Why We’re Falling for New Design Techniques
- Portent.com: They’ve gone to plaid