Title: Problems with "Umlaute"
Name: LEXO
Date: 12/3/2013 3:29:23 AM
Comment:
I see there's a problem when the URL contains Umlaute like ä, ö or ü. In the past this was not an issue but in a modern CMS like Wordpress and UTF-8 charsets users are allowed to upload filenames containing Umlaute. Thus I slightly modified the Regex to:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#äöüÄÖÜ]*[\w\-\@?^=%&\/~\+#])?
Title: matches http://www.textlink
Name: czechmate1976
Date: 8/2/2013 10:55:47 AM
Comment:
the pattern matches domain name without the top level domain (e.g .com).
Title: Why &
Name: BurninLeo
Date: 12/27/2012 6:29:18 AM
Comment:
Thanks for doing improvements on the expression even after years!
I do not fully understand th & in the expression. To my best knowledge, Regex does not work with HTML entities and [] does only work on single character base - therefore a, m, and p shall already be included in \w, shouldn't they?!
@Abdel Hady: When your pattern delimiter is slash (/), please use a blackslash before each slash in the pattern. And when using PHP, of course, every backslash needs another backslash to escape for PHP.
Title: not working with php's preg_replace_callback
Name: Abdel Hady
Date: 4/28/2012 7:44:22 PM
Comment:
it gives me
Fatal error: Uncaught exception 'Mysite_Exception_Warning' with message 'preg_replace_callback(): Unknown modifier '~''
Title: Say good
Name: Hunglt (VietNam)
Date: 2/5/2012 8:27:21 PM
Comment:
Thanks you
Title: Didn't run in Javascript
Name: Gk
Date: 9/21/2010 11:23:37 AM
Comment:
IMHO it should be:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?
Title: Wrong classes
Name: OD
Date: 5/4/2010 10:22:48 AM
Comment:
Also, the allowed characters, as defined by RFC 2141, are exactly the class [0-9a-zA-Z()+,.:=@;$_!*'%/?#-]
Title: Poor syntax
Name: OD
Date: 5/4/2010 10:15:40 AM
Comment:
Because of unnecessary escaping in the character classes, '\' is being included and it's not clear whether it should be.
[\w\-_] is better written as [\w-]
[\w\-\.,@?^=%&:/~\+#] is better written as [\w.,@?^=%&:/~+#-]
etc.
Title: Improvements?
Name: Brad
Date: 9/11/2009 10:39:08 AM
Comment:
Great expression; check these out:
http://regexlib.com/REDetails.aspx?regexp_id=2767
http://regexlib.com/REDetails.aspx?regexp_id=2766
Title: this matches http://www.yahoo when it shouldnt
Name: Andy
Date: 3/28/2009 3:35:38 PM
Comment:
I just tried the following PHP and it passes, when it should fail. Any tips?
function testURL() {
$urltocheck = 'http://www.yahoo';
if(preg_match("/^((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?/", $urltocheck)) {
echo "Pass";
return 1;
}
print "invalid";
return 0;
Title: Displayed wrong
Name: Regex Newbie
Date: 8/3/2008 3:24:20 PM
Comment:
It works good for me in VisualBasic 6, however it is displayed here wrong, it has been html encoded with & instead of just &
Now all I need is a regex that can get relative links out of an <a> tag for my spider.
Title: little mod
Name: Pawel Gruszecki
Date: 3/11/2008 7:37:23 PM
Comment:
^((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?
I've made a little correction in RegExp posted by John Brooking on 11/7/2007 2:51:56 AM cause php server returned an arror in expression. This works perfect.
Matches:
http://regxlib.com/Default.aspx
http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html
www.yahoo.com
yahoo.com
Non-Matches:
http//regxlib.com/Default.aspx
hppt://google.pl
Title: little mod
Name: Pawel Gruszecki
Date: 3/11/2008 7:37:03 PM
Comment:
^((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?
I've made a little correction in RegExp posted by John Brooking on 11/7/2007 2:51:56 AM cause php server returned an arror in expression. This works perfect.
Matches:
http://regxlib.com/Default.aspx
http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html
www.yahoo.com
yahoo.com
Non-Matches:
http//regxlib.com/Default.aspx
hppt://google.pl
Title: @ character
Name: Bob Hurt
Date: 1/4/2008 2:14:37 PM
Comment:
The '@' character doesn't have a special regular expression meaning, does it? If it does, what is the meaning? If it does not, why is the second '@' escaped with a backslash?
Title: Pattern Title - M H
Name: Candida
Date: 11/7/2007 2:51:56 AM
Comment:
Hi, this was really helpful....thanks for the post. :)
Title: To match www.yahoo.com
Name: John Brooking
Date: 10/12/2005 5:25:40 PM
Comment:
First, I've got to say that comments like "This expression does not work" are not helpful. I was able to get it to properly pick up the URL out of the string "The URL is www.my-domain.com?id=5&b=.&c=5.", and it even excluded the final period but got the others. What expression did you get it fail on? Maybe it can be fixed. Be a little helpful!
Anyway, I'm mainly posting to say that the following variation
((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?
seems to pick up the ones without the leading protocol string, as long as they start with "www.". Sure, they don't all start with www., but you gotta recognize it somehow. Maybe you could do more with looking for the final top-level string (com, org, us, uk, ...) and backwards from there. Anyhow, this works for my purposes, so I thought I'd share it. It *does* match "www.yahoo.com".
Title: Mr.
Name: Jon
Date: 9/17/2005 3:28:50 AM
Comment:
This expression does not work
Title: Span Multiple lines but not match whitespace
Name: MDR
Date: 5/9/2005 11:47:34 AM
Comment:
How could I make this span multiple lines but not match white spaces.
For example
http://www.msn.com/pl
acestogo/default.aspx
Title: bad
Name: guili
Date: 2/1/2005 4:01:08 AM
Comment:
* Allows spaces in URLs
* Allows more than one ? in URL
Title: expression does work
Name: akula
Date: 9/24/2004 6:37:35 AM
Comment:
Hi the below given http link does not match with the expression
http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html
Title: Fix for domain-name.net
Name: Venata
Date: 9/16/2004 5:15:07 AM
Comment:
YOu need to replace
[\w]+ with [\w\-]+
Title: Not match
Name: Venata
Date: 9/16/2004 5:11:33 AM
Comment:
It does not match
http://some-domain.net/
Title: good for extracting urls
Name: maximilla
Date: 7/12/2004 4:16:53 PM
Comment:
gorgeous, thank you! works perfect for my screenscraper.
Title: ops
Name: M H
Date: 4/28/2004 4:09:21 PM
Comment:
Actually u should replace this partial code
(\.[\w]+)+
to
(\.[\w]+)?
Title: Depends on what u consider is "wrong"
Name: M H
Date: 4/28/2004 4:02:54 PM
Comment:
You could have internal URL like http://localhost/blahblahblah.html
This is not a code to validate and verify the URL (e.g. you could put http://123.123/ and still get it work, but whether it's a "right" URL is beyond what this code is suppose to do)
Anyway I've changed the code to "exclude" the one-part domain for general use but if you want to include internal URLs as well just remove the "{1,}" from the code.
Title: Not great
Name: Stephen
Date: 4/28/2004 3:30:56 PM
Comment:
Critic is right; doesn't require periods in the domain name. ie, 'http://www' passed
Title: Worked out swell
Name: TjoekBezoer
Date: 3/30/2004 8:40:33 PM
Comment:
Worked out perfect for me. Used for for checking a 404 paramter in IIS, and it did exactly what it needed to do.
Title: very bad don't use it
Name: critic
Date: 12/1/2003 6:49:39 AM
Comment:
it allows http://www. for example...