Fixing broken links using Levenshtein Distance: A PHP tool

Ian Lurie Aug 31 2010

link recovery
One easy way to recover lost link authority is to 301 redirect broken external links to relevant pages. But, if you’ve got a site with hundreds or thousands of URLs, and dozens of broken external links, doing that correction can be a real headache.
Enter Levenshtein Distance. That’s a fancy name for a simple concept: You can find the closest match for a word or phrase (or URL) by calculating how many edits it will take to get from that word or phrase to another test phrase.
Levenshtein is used a lot in spell checkers. If I type teh instead of the (who hasn’t?), the Levenshtein distance is 2: Delete the ‘e’ or the ‘h’, then add an ‘e’ or an ‘h’.

Link-recover: An example of Levenshtein Distance calculation

I’ve put together a very basic example of this in a new tool. Take 100 broken URLs and 100 good ones, and it’ll match the broken ones with the closest working ones. Click here to use the tool
Our tool’s only limited to 100 broken and 100 good URLs. But you can download the complete code, remove the limit and do whatever you want with it: Click here to download the entire sample application. It’s in very simple PHP.
[ Levenshtein link redirection tool ] [ Download the sample code ]