Fixing broken links using Levenshtein Distance: A PHP tool
Ian Lurie Aug 31 2010
One easy way to recover lost link authority is to 301 redirect broken external links to relevant pages. But, if you’ve got a site with hundreds or thousands of URLs, and dozens of broken external links, doing that correction can be a real headache.
Enter Levenshtein Distance. That’s a fancy name for a simple concept: You can find the closest match for a word or phrase (or URL) by calculating how many edits it will take to get from that word or phrase to another test phrase.
Levenshtein is used a lot in spell checkers. If I type teh instead of the (who hasn’t?), the Levenshtein distance is 2: Delete the ‘e’ or the ‘h’, then add an ‘e’ or an ‘h’.
Link-recover: An example of Levenshtein Distance calculation
I’ve put together a very basic example of this in a new tool. Take 100 broken URLs and 100 good ones, and it’ll match the broken ones with the closest working ones. Click here to use the tool
Our tool’s only limited to 100 broken and 100 good URLs. But you can download the complete code, remove the limit and do whatever you want with it: Click here to download the entire sample application. It’s in very simple PHP.
[ Levenshtein link redirection tool ] [ Download the sample code ]
Ian Lurie is CEO and founder of Portent Inc. He's recorded training for Lynda.com, writes regularly for the Portent Blog and has been published on AllThingsD, Forbes.com and TechCrunch. Ian speaks at conferences around the world, including SearchLove, MozCon, SIC and ad:Tech. Follow him on Twitter at portentint.He also just published a book about strategy for services businesses: One Trick Ponies Get Shot, available on Kindle. Read More
- 5 Ways Your Site Might Fail Google’s Mobile-Friendly Test
- My (Insanely Large) List of SEO Tools & Other Useful Resources
- Google’s Data Highlighter: Your New Favorite Backup Plan
- How I Know Which of Your Links Are Bad – Link Profile Review Tips
- Search Awards 2014 – Best SEO Campaign
- Announcing Our SERP Preview Tool