/(http:[^ <]*)/i
Which basically looks for http: and then any number of letters which aren't a space or a > (i.e. a newline or some other HTML tag). This isn't by any means foolproof but it does mean that URLs become links.
Recently I've been working on a big database of contact information, a lot of which includes website URLs. Most of them don't include the http:// so I was thinking "okay, I'll just look for www. instead". However, then I remembered the no-www.org campaign - in fact this very blog has no www in its address - and realised that it was time to search out a proper regex.
And that's where it went wrong because although there are many such patterns out there, they all try to validate the URL as well as find it. This means they try to list all the protocols (like ftp:// or even gopher://) and all the TLD extensions (.com, .uk, .museum etc) which, to my mind, is a losing battle.
In this case I don't need to match the protocol and I'm really not bothered about false positives as long as all the real URLs do become clickable links. I noticed that gchat makes a link out of anything that has no spaces and at least one dot which is pretty much all I need. Sure, it means that a typo of "oh.my.god." becomes a link but, well, I can live with that. Users will understand what happened, right?
So screw those ludicrously long regexes; here's mine:
/([^ <>@/]+(.[^ <>@.][^ <>@.]+)+)/i
Works beautifully. I added the @ matching so it wouldn't pick up emails and adjusted it to require at least two letters between dots so that "e.g.this" doesn't match.
