RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find Label all parts of a URL
Expression
(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/)) (?:(?<usrpwd>\w+\:\w+)(?:\@))? (?<domain>[^/\r\n\:]+)? (?<port>\:\d+)? (?<path>(?:\/.*)*\/)? (?<filename>.*?\.(?<ext>\w{2,4}))? (?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)* (?<bkmrk>\#.*)?
Description
I needed a regular expression to break urls into labled parts. This is what I came up with. Got a few ideas from regexlib.com and from this msdn article. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/reconbackreferences.asp http://www.domain.com/folder does return a match but will not grab the folder name unless there is &quot;/&quot; at the end. http://www.domain.com/folder/
Matches
https://192.168.0.2:80/users/~fname.lname/file.ext | ftp://user1:[email protected] | http://www.dom
Non-Matches
www.domain.com | user1:[email protected] | 192.168.0.2/folder/file.ext
Author Rating: The rating for this expression. Ariel Merrell
Source
Your Rating
Bad Good

Enter New Comment

Title

Name

Comment

Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Fails on URLs without filename but with query string
Name: Doug Warren
Date: 5/21/2020 8:12:22 PM
Comment:
If the URL has no filename but it does have a query string, this regular expression identifies the query string as the filename. Assuming that's still a legal URL, one way to correct this might be to forbid question marks in the filename capture group, perhaps by changing it to: (?<filename>[^\?]*?\.(?<ext>\w{2,4}))


Title: for PHP
Name: stur
Date: 5/20/2006 5:44:30 AM
Comment:
$patern='" (?:(http(?:s?)|ftp)(?:\:\/\/)) # protocol (?:(\w+\:\w+)(?:\@))? # usrpwd ([^/\r\n\:]+)? # domain (\:\d+)? # port ((?:\/[^\?]*)*\/)? # path (.*?\.(\w{2,4}))? # filename.ext (\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)* # qrystr (\#[\w\d]+)? # bkmrk "six'; preg_match_all($patern,$html,$match); chenge # bkmrk, # path


Title: RE:Path matching problem
Name: Chris Strolia-Davis
Date: 8/1/2005 7:51:35 AM
Comment:
If you are talking about the RE here, then put a question mark inside the <path> group at the end. like this: (?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/)) (?:(?<usrpwd>\w+\:\w+)(?:\@))? (?<domain>[^/\r\n\:]+)? (?<port>\:\d+)? (?<path>(?:\/.*)*\/?)? (?<filename>.*?\.(?<ext>\w{2,4}))? (?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)* (?<bkmrk>\#.*)? note: I haven't tested to see if this broke anything. The RE you included in your message does not seem to work with the first " character in it(even if escaped, not sure why it was there), and when removed it seems to work with both of your examples in the tester.


Title: path matching problem
Name: Joe
Date: 7/29/2005 10:53:51 PM
Comment:
Can someone tell me how to get this regex to match "folder/sub" as the path when there is not a slash "/" at the end: http://www.example.com/folder/sub ^(?:(http|https|ftp)("?:\:\/\/))( ?:(\w+\:\w+)( ?:\@))?([^\/\:\s\@]+)( ?:\:(\d+))?(( ?:\/.*)*\/)?(.*?( ?:\.([\w]*))?)?( ?:\?([^#]*))?("?:\#(.*))?$ It only seems to correctly match the path if there's a slash at the end: http://www.example.com/folder/sub/


Title: MS' named captures are a good idea to avoid some unnecessary lines of code
Name: Marek Möhling
Date: 3/14/2005 2:01:55 PM
Comment:
> You should be able to remove the <caputer name> and > reference it by number. Yes... > I can remember they don't have named captures I don't like to admit it, but MS' named captures are a good idea to avoid some unnecessary lines of code


Title: Perl and PHP captures
Name: Ariel Merrell
Date: 3/14/2005 12:16:26 PM
Comment:
I'm not very familiar with PHP or Perl since I don't activley use them but from what I can remember they don't have named captures. You should be able to remove the <caputer name> and reference it by number. I don't know how I would keep track of all the captures without named captures.


Title: tried this unsuccessfully with Perl and PHP
Name: Marek Möhling
Date: 3/13/2005 6:31:00 PM
Comment:
I tried this unsuccessfully with Perl and PHP. I couldn't find any mention of the (?<label>someStr) syntax in the Perl/PHP regex documentation I checked. Is this proprietary to the MS or .NET regex implementation? If not, could you show an Perl or PHP example? Cheers anyway.


Title: Error in domain part
Name: Softlion
Date: 2/1/2005 5:36:15 AM
Comment:
Missing (?<domain>[^/\r\n\:]+)? should be (?<domain>[^\/\r\n\:]+)?


Title: Other fix
Name: Softlion
Date: 2/1/2005 5:12:05 AM
Comment:
* Removed ? after domain so it does not match http:// alone any more ^(?:(http|https|ftp)(?:\:\/\/))(?:(\w+\:\w+)(?:\@))?([^/\:\s\@]+)(?:\:(\d+))?((?:\/.*)*\/)?(.*?(?:\.([\w]*))?)?(?:\?([^#]*))?(?:\#(.*))?$


Title: Fix
Name: Softlion
Date: 2/1/2005 5:06:08 AM
Comment:
I prefer it this way: * work client side (removed ?<...> and cr/lf, sorry) * doesn't accept ftp://@toto.com or http://toto toto.com * removed ':' and '#' and '?' from results (port, query, bookmark) * added ^ and $ * accepts http://domain/toto (without ending /) * accepts https://domain/toto.c (one char at end) ^(?:(http|https|ftp)(?:\:\/\/))(?:(\w+\:\w+)(?:\@))?([^/\:\s\@]+)?(?:\:(\d+))?((?:\/.*)*\/)?(.*?(?:\.([\w]*))?)?(?:\?([^#]*))?(?:\#(.*))?$


Title: my problem solved
Name: Dave S
Date: 12/27/2004 1:26:29 PM
Comment:
<my message was snipped in last post. continued here> address each submatch simply by their index. So, (?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/)) actually becomes (?:(http(?:s?)|ftp)(?:\:\/\/)) and can be address by objRE.Match(index).SubMatches(0) Regards.


Title: my problem solved
Name: Dave S
Date: 12/27/2004 1:25:36 PM
Comment:
Hi all. I have discovered the problem I was having regarding the syntax error in my previous post. Here is a working version of this expression as I'm using it with VBSript in an ASP environment: set objRE = new RegExp objRE.Global = True objRE.IgnoreCase = True objRE.Pattern = "(?:(http(?:s?)|ftp)(?:(?:\:\/\/)))"&_ "(?:(\w+\:*\w+)(?:\@))?"&_ "([^/\r\n\:]+)?"&_ "(\:\d+)?"&_ "((?:\/.*)*\/)?"&_ "(.*?\.(\w{2,4}))?"&_ "(\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*"&_ "(\#.*)?" NOTE: I've made a slight adjustment to the expression in the "usrpwd" submatch so that passwords are optional (i.e. myusername:[email protected] matches in the original, but now [email protected] should also match (for my purposes). Also NOTE: the syntax errors were caused by the explicit 'submatch' names. VBSript (or ASP) doesn't support the (?<match_name>) propery. Instead, we have to address each submatch simply by their index. So, (?:(?<pr


Title: Syntax error message
Name: Dave S
Date: 12/27/2004 12:09:04 AM
Comment:
Hello. I'm trying to use this expression in an ASP environment. Below is my code: set objRegExp = new RegExp objRegExp.Global = True objRegExp.IgnoreCase = True objRegExp.Pattern = "(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))"&_ "(?:(?<usrpwd>\w+\:\w+)(?:\@))?"&_ "(?<domain>[^/\r\n\:]+)?"&_ "(?<port>\:\d+)?"&_ "(?<path>(?:\/.*)*\/)?"&_ "(?<filename>.*?\.(?<ext>\w{2,4}))?"&_ "(?<qrystr>\??(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*)*"&_ "(?<bkmrk>\#.*)?" (sorry if there are unexpected line breaks in this forum...) However, I'm getting this response: "Syntax error in regular expression". I've cut-n-pasted the expression from this page, and I've tried to double-up the backslashes...but my experience with RegExpressions is limited. So, I'm not sure what type of syntax errors I should look for. Note also that I cannot find any option in ASP's RegExp object to 'ignore pattern whitespace' - but I've been seeing that instruction a lot in these forums. I'd apprecia


Title: Re: Bad expression
Name: Ariel Merrell
Date: 12/16/2004 5:31:47 PM
Comment:
What sort of error are you getting? How are you using it and what are you trying to match? If you are using as is make sure you have set the ignore pattern whitespace option.


Title: Bad expression
Name: Felipe Drumond
Date: 12/16/2004 3:31:34 PM
Comment:
Does not seem to work


Title: Slight Modification
Name: Eric Maino
Date: 12/8/2004 1:31:56 AM
Comment:
You should change the first line to: (?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))? and the port line to: (\:(?<port>\d+))? This way it actually returns the port number and it also matches addresses such as the following: www.google.com/library


Title: c# version
Name: Rick
Date: 8/6/2004 9:51:25 PM
Comment:
using System.Text.RegularExpressions; using System; class Validation { public static void Main() { // remove any line breaks // all the backlashes have been doubled up Regex regexObj = new Regex("(?:(?<protocol>http(?:s?)|ftp)(?:\\:\\/\\/))(?:(?<usrpwd>\\w+\\:\\w+)(?:\\@))?(?<domain>[^/\\r\\n\\:]+)?(?:\\:(?<port>\\d+))?(?<path>(?:\\/.*)*\\/)?(?<filename>.*?\\.(?<ext>\\w{2,4}))?(?<qrystr>\\??(?:\\w+\\=[^\\#]+)(?:\\&?\\w+\\=\\w+)*)*(?<bkmrk>\\#.*)?"); // If you want the question mark out of the query string you can do the following as well '(?:\??(?<qrystr>(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*))*'. Match m = regexObj.Match("http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=c%23"); // Should print [www.google.com]. Console.WriteLine(m.Groups["domain"].ToString()); } }


Title: excellent with a minor correction
Name: Patrick Fogarty
Date: 7/8/2004 1:02:55 AM
Comment:
This is excellent, though I noticed that the colon character got sucked up into the port. Changing the port pattern to '(?:\:(?<port>\d+))?' fixes it. If you want the question mark out of the query string you can do the following as well '(?:\??(?<qrystr>(?:\w+\=[^\#]+)(?:\&?\w+\=\w+)*))*'.


Title: excellent with a minor correction
Name: Patrick Fogarty
Date: 7/8/2004 12:57:13 AM
Comment:
This is excellent, though I noticed that the colon character got sucked up into the port. Changing the port pattern to '(?:\:(?<port>\d+))?' fixes it. If you want the question mark out of the query string you can do the following ''.


Title: Does not seem to work
Name: Patrice Boissonneault
Date: 6/7/2004 4:03:25 PM
Comment:
Hello, I just tried this very promissing regex, but it does not seem to work for me. Would you have a c# example by any luck? Thanks. Please e-mail to [email protected]


Title: Mr
Name: Jerry
Date: 4/20/2004 1:35:29 AM
Comment:
This is FANTASTIC! My hats off to you.


Title: Corrected Port
Name: AMerrell
Date: 3/25/2004 9:11:22 AM
Comment:
Thanks for the comment. I have made the change you suggested.


Title: Port expression is not correct
Name: Paulo Patrício
Date: 3/25/2004 7:21:16 AM
Comment:
The port expression \:\d+[^\/\r\n] did not work in this example, but \:\d+ is enought to do the port detection.


Copyright © 2001-2024, RegexAdvice.com | ASP.NET Tutorials