Fixing broken links using Levenshtein Distance: A PHP tool
Ian Lurie Aug 31 2010
One easy way to recover lost link authority is to 301 redirect broken external links to relevant pages. But, if you’ve got a site with hundreds or thousands of URLs, and dozens of broken external links, doing that correction can be a real headache.
Enter Levenshtein Distance. That’s a fancy name for a simple concept: You can find the closest match for a word or phrase (or URL) by calculating how many edits it will take to get from that word or phrase to another test phrase.
Levenshtein is used a lot in spell checkers. If I type teh instead of the (who hasn’t?), the Levenshtein distance is 2: Delete the ‘e’ or the ‘h’, then add an ‘e’ or an ‘h’.
Link-recover: An example of Levenshtein Distance calculation
I’ve put together a very basic example of this in a new tool. Take 100 broken URLs and 100 good ones, and it’ll match the broken ones with the closest working ones. Click here to use the tool
Our tool’s only limited to 100 broken and 100 good URLs. But you can download the complete code, remove the limit and do whatever you want with it: Click here to download the entire sample application. It’s in very simple PHP.
[ Levenshtein link redirection tool ] [ Download the sample code ]
CEO & Founder
Ian Lurie is CEO and founder of Portent and the EVP of Marketing Services at Clearlink. He's been a digital marketer since the days of AOL and Compuserve (25 years, if you're counting). He's recorded training for Lynda.com, writes regularly for the Portent Blog and has been published on AllThingsD, Smashing Magazine, and TechCrunch.Ian speaks at conferences around the world, including SearchLove, MozCon, Seattle Interactive Conference and ad:Tech. He has published has published several books about business and marketing: One Trick Ponies Get Shot, available on Kindle, The Web Marketing All-In-One Desk Reference for Dummies, and Conversation Marketing.Follow him on Twitter at portentint, and on LinkedIn at LinkedIn.com/in/ianlurie. Read More