Title: Fails on URLs without filename but with query string
	                Name: Doug Warren
	                Date: 5/21/2020 8:12:22 PM
	                Comment: 
If the URL has no filename but it does have a query string, this regular expression identifies the query string as the filename.  Assuming that's still a legal URL, one way to correct this might be to forbid question marks in the filename capture group, perhaps by changing it to: (?<filename>[^\?]*?\.(?<ext>\w{2,4}))
                
                
            
                
	                Title: for PHP 
	                Name: stur
	                Date: 5/20/2006 5:44:30 AM
	                Comment: 
$patern='"
(?:(http(?:s?)|ftp)(?:\:\/\/))         # protocol
(?:(\w+\:\w+)(?:\@))?                  # usrpwd
([^/\r\n\:]+)?                         # domain
(\:\d+)?                               # port
((?:\/[^\?]*)*\/)?                     # path
(.*?\.(\w{2,4}))?                      # filename.ext
(\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*  # qrystr
(\#[\w\d]+)?                           # bkmrk
"six';
preg_match_all($patern,$html,$match);
chenge # bkmrk, # path
                
                
            
                
	                Title: RE:Path matching problem
	                Name: Chris Strolia-Davis
	                Date: 8/1/2005 7:51:35 AM
	                Comment: 
If you are talking about the RE here, then put a question mark inside the <path> group at the end. like this:
(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))
(?:(?<usrpwd>\w+\:\w+)(?:\@))?
(?<domain>[^/\r\n\:]+)?
(?<port>\:\d+)?
(?<path>(?:\/.*)*\/?)?
(?<filename>.*?\.(?<ext>\w{2,4}))?
(?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*
(?<bkmrk>\#.*)?
note: I haven't tested to see if this broke anything.
The RE you included in your message does not seem to work with the first " character in it(even if escaped, not sure why it was there), and when removed it seems to work with both of your examples in the tester.
                
                
            
                
	                Title: path matching problem
	                Name: Joe
	                Date: 7/29/2005 10:53:51 PM
	                Comment: 
Can someone tell me how to get this regex to match "folder/sub" as the path when there is not a slash "/" at the end: 
http://www.example.com/folder/sub
^(?:(http|https|ftp)("?:\:\/\/))(
?:(\w+\:\w+)(
?:\@))?([^\/\:\s\@]+)(
?:\:(\d+))?((
?:\/.*)*\/)?(.*?(
?:\.([\w]*))?)?(
?:\?([^#]*))?("?:\#(.*))?$
It only seems to correctly match the path if there's a slash at the end: 
http://www.example.com/folder/sub/
                
                
            
                
	                Title: MS' named captures are a good idea to avoid some unnecessary lines of code
	                Name: Marek Möhling
	                Date: 3/14/2005 2:01:55 PM
	                Comment: 
> You should be able to remove the <caputer name> and
> reference it by number.
Yes...
> I can remember they don't have named captures
I don't like to admit it, but MS' named captures are a good idea to avoid some unnecessary lines of code
                
                
            
                
	                Title: Perl and PHP captures
	                Name: Ariel Merrell
	                Date: 3/14/2005 12:16:26 PM
	                Comment: 
I'm not very familiar with PHP or Perl since I don't activley use them but from what I can remember they don't have named captures.  
You should be able to remove the <caputer name> and reference it by number. I don't know how I would keep track of all the captures without named captures.
                
                
            
                
	                Title: tried this unsuccessfully with Perl and PHP
	                Name: Marek Möhling
	                Date: 3/13/2005 6:31:00 PM
	                Comment: 
I tried this unsuccessfully with Perl and PHP.
I couldn't find any mention of the (?<label>someStr) syntax in the Perl/PHP regex documentation I checked.
Is this proprietary to the MS or .NET regex implementation?
If not, could you show an Perl or PHP example?
Cheers anyway.
                
                
            
                
	                Title: Error in domain part
	                Name: Softlion
	                Date: 2/1/2005 5:36:15 AM
	                Comment: 
Missing    (?<domain>[^/\r\n\:]+)?
should be
   (?<domain>[^\/\r\n\:]+)?
                
                
            
                
	                Title: Other fix
	                Name: Softlion
	                Date: 2/1/2005 5:12:05 AM
	                Comment: 
* Removed ? after domain so it does not match
   http://
alone any more
^(?:(http|https|ftp)(?:\:\/\/))(?:(\w+\:\w+)(?:\@))?([^/\:\s\@]+)(?:\:(\d+))?((?:\/.*)*\/)?(.*?(?:\.([\w]*))?)?(?:\?([^#]*))?(?:\#(.*))?$
                
                
            
                
	                Title: Fix
	                Name: Softlion
	                Date: 2/1/2005 5:06:08 AM
	                Comment: 
I prefer it this way:
* work client side (removed ?<...> and cr/lf, sorry)
* doesn't accept ftp://@toto.com or http://toto toto.com
* removed ':' and '#' and '?' from results (port, query, bookmark)
* added ^ and $
* accepts http://domain/toto (without ending /)
* accepts https://domain/toto.c (one char at end)
^(?:(http|https|ftp)(?:\:\/\/))(?:(\w+\:\w+)(?:\@))?([^/\:\s\@]+)?(?:\:(\d+))?((?:\/.*)*\/)?(.*?(?:\.([\w]*))?)?(?:\?([^#]*))?(?:\#(.*))?$
                
                
            
                
	                Title: my problem solved
	                Name: Dave S
	                Date: 12/27/2004 1:26:29 PM
	                Comment: 
<my message was snipped in last post.  continued here>
address each submatch simply by their index.  So,
(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))
actually becomes
(?:(http(?:s?)|ftp)(?:\:\/\/))
and can be address by objRE.Match(index).SubMatches(0)
Regards.
                
                
            
                
	                Title: my problem solved
	                Name: Dave S
	                Date: 12/27/2004 1:25:36 PM
	                Comment: 
Hi all.
I have discovered the problem I was having regarding the syntax error in my previous post.  Here is a working version of this expression as I'm using it with VBSript in an ASP environment:
set objRE = new RegExp
    objRE.Global = True
    objRE.IgnoreCase = True
    objRE.Pattern = "(?:(http(?:s?)|ftp)(?:(?:\:\/\/)))"&_
	    "(?:(\w+\:*\w+)(?:\@))?"&_
	    "([^/\r\n\:]+)?"&_
	    "(\:\d+)?"&_
	    "((?:\/.*)*\/)?"&_
	    "(.*?\.(\w{2,4}))?"&_
	    "(\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*"&_		    "(\#.*)?"
NOTE: I've made a slight adjustment to the expression in the "usrpwd" submatch so that passwords are optional (i.e. myusername:[email protected] matches in the original, but now [email protected] should also match (for my purposes).
Also NOTE: the syntax errors were caused by the explicit 'submatch' names.  VBSript (or ASP) doesn't support the (?<match_name>) propery.  Instead, we have to address each submatch simply by their index.  So,
(?:(?<pr
                
                
            
                
	                Title: Syntax error message
	                Name: Dave S
	                Date: 12/27/2004 12:09:04 AM
	                Comment: 
Hello.
I'm trying to use this expression in an ASP environment.  Below is my code:
set objRegExp = new RegExp
    objRegExp.Global = True
    objRegExp.IgnoreCase = True
objRegExp.Pattern = "(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))"&_
"(?:(?<usrpwd>\w+\:\w+)(?:\@))?"&_
"(?<domain>[^/\r\n\:]+)?"&_
"(?<port>\:\d+)?"&_
"(?<path>(?:\/.*)*\/)?"&_
"(?<filename>.*?\.(?<ext>\w{2,4}))?"&_
"(?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*"&_
"(?<bkmrk>\#.*)?"
(sorry if there are unexpected line breaks in this forum...)
However, I'm getting this response: "Syntax error in regular expression".
I've cut-n-pasted the expression from this page, and I've tried to double-up the backslashes...but my experience with RegExpressions is limited.  So, I'm not sure what type of syntax errors I should look for.
Note also that I cannot find any option in ASP's RegExp object to 'ignore pattern whitespace' -  but I've been seeing that instruction a lot in these forums.
I'd apprecia
                
                
            
                
	                Title: Re: Bad expression
	                Name: Ariel Merrell
	                Date: 12/16/2004 5:31:47 PM
	                Comment: 
What sort of error are you getting?  How are you using it and what are you trying to match?  If you are using as is make sure you have set the ignore pattern whitespace option.
                
                
            
                
	                Title: Bad expression
	                Name: Felipe Drumond
	                Date: 12/16/2004 3:31:34 PM
	                Comment: 
Does not seem to work
                
                
            
                
	                Title: Slight Modification
	                Name: Eric Maino
	                Date: 12/8/2004 1:31:56 AM
	                Comment: 
You should change the first line to:
(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))?
and the port line to:
(\:(?<port>\d+))?
This way it actually returns the port number and it also matches addresses such as the following:
www.google.com/library
                
                
            
                
	                Title: c# version
	                Name: Rick
	                Date: 8/6/2004 9:51:25 PM
	                Comment: 
using System.Text.RegularExpressions;
using System;
class Validation
{ 
	public static void Main()
	{
                // remove any line breaks
                // all the backlashes have been doubled up
		Regex regexObj = new Regex("(?:(?<protocol>http(?:s?)|ftp)(?:\\:\\/\\/))(?:(?<usrpwd>\\w+\\:\\w+)(?:\\@))?(?<domain>[^/\\r\\n\\:]+)?(?:\\:(?<port>\\d+))?(?<path>(?:\\/.*)*\\/)?(?<filename>.*?\\.(?<ext>\\w{2,4}))?(?<qrystr>\\??(?:\\w+\\=[^\\#]+)(?:\\&?\\w+\\=\\w+)*)*(?<bkmrk>\\#.*)?");
		// If you want the question mark out of the query string you can do the following as well '(?:\??(?<qrystr>(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*))*'.
		Match m = regexObj.Match("http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=c%23");
		// Should print [www.google.com].
		Console.WriteLine(m.Groups["domain"].ToString());
	}  
}
                
                
            
                
	                Title: excellent with a minor correction
	                Name: Patrick Fogarty
	                Date: 7/8/2004 1:02:55 AM
	                Comment: 
This is excellent, though I noticed that the colon character got sucked up into the port.  Changing the port pattern to '(?:\:(?<port>\d+))?' fixes it.  If you want the question mark out of the query string you can do the following as well '(?:\??(?<qrystr>(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*))*'.
                
                
            
                
	                Title: excellent with a minor correction
	                Name: Patrick Fogarty
	                Date: 7/8/2004 12:57:13 AM
	                Comment: 
This is excellent, though I noticed that the colon character got sucked up into the port.  Changing the port pattern to '(?:\:(?<port>\d+))?' fixes it.  If you want the question mark out of the query string you can do the following ''.
                
                
            
                
	                Title: Does not seem to work
	                Name: Patrice Boissonneault
	                Date: 6/7/2004 4:03:25 PM
	                Comment: 
Hello,
I just tried this very promissing regex, but it does not seem to work for me.  Would you have a c# example by any luck?
Thanks.
Please e-mail to [email protected]
                
                
            
                
	                Title: Mr
	                Name: Jerry
	                Date: 4/20/2004 1:35:29 AM
	                Comment: 
This is FANTASTIC!  My hats off to you.
                
                
            
                
	                Title: Corrected Port
	                Name: AMerrell
	                Date: 3/25/2004 9:11:22 AM
	                Comment: 
Thanks for the comment.  I have made the change you suggested.
                
                
            
                
	                Title: Port expression is not correct
	                Name: Paulo Patrício
	                Date: 3/25/2004 7:21:16 AM
	                Comment: 
The port expression \:\d+[^\/\r\n] did not work in this example, but \:\d+ is enought to do the port detection.