Displaying page
of
pages;
Items to
Title |
Test
Details
Pattern Title
|
Expression |
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])? |
Description |
*CORRECTED: Again thanks for all the comments below. If you want to include internal domain as well change the partial code (\.[\w-_]+)+ to (\.[\w-_]+)?
See the comments below*
This is the regular expression I use to add links in my email program. It also ignores those suppose-to-be commas/periods/colons at the end of the URL, like this sentence "check out http://www.yahoo.com/." (the period will be ignored) Note that it requires some modification to match ones that dont start with http. |
Matches |
http://regxlib.com/Default.aspx | http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html |
Non-Matches |
www.yahoo.com |
Author |
Rating:
M H
|
Title |
Test
Details
Pattern Title
|
Expression |
^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s]$ |
Description |
This Regex (can be used e.g. in PHP with eregi) will match any valid URL. Unlike the other exapmles here, it will NOT match a valid URL ending with a dot or bracket. This is important if you use this regex to find and "activate" Links in an Text |
Matches |
https://www.restrictd.com/~myhome/ |
Non-Matches |
http://www.krumedia.com. | (http://www.krumedia.com) | http://www.krumedia.com, |
Author |
Rating:
Not yet rated.
Michael Krutwig
|
Title |
Test
Details
Pattern Title
|
Expression |
^<a\s+href\s*=\s*"http:\/\/([^"]*)"([^>]*)>(.*?(?=<\/a>))<\/a>$ |
Description |
Regexp to find all external links in a HTML string.
Can easily be modified to handle all/other links/protocols (like file/https/ftp).
Uses lookahead assertions and non-greedy modifier to check for the end </a> but still allow html tags inbetween start and end A tag.
Takes into account that there could be linebreaks and other nasty whitespace chars in the middle of the tag.
I am using it to find all external links in embedded HTML code and change 1.the target of the link 2.insert a "Leaving Site" logo to illustrate you are leaving site. |
Matches |
<a href="http://www.mysite.com">my external link</a> | <a href="http:/ |
Non-Matches |
<a href="myinternalpage.html">my internal link</a> |
Author |
Rating:
Not yet rated.
Anders Rask
|
Title |
Test
Details
Pattern Title
|
Expression |
<[aA][ ]{0,}([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,}>((<(([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})>([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})|(([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})){0,} |
Description |
I wrote this sweet little (well, not so little really) reg to extract links from an HTML source.... it is very robust, give it a try.
The only limitation I have discovered is that it can't match invalid HTML... |
Matches |
<a href='javascript:functionA();'><i>this text is italicized</i></a> |
Non-Matches |
<A href='#'><P</A></P> |
Author |
Rating:
Not yet rated.
Brian Webb
|
Title |
Test
Details
Pattern Title
|
Expression |
\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b |
Description |
Whilst writing a plain-text to HTML function, I ran into the problem of links that users had written with &lt;a&gt; tags (as opposed to just writing the URL) were linking improperly. This regular expression returns many types of URL, and preceding characters, if any. This allows you to handle each type of match appropriately |
Matches |
|
Non-Matches |
www.deepart.org | deepart.org | 123.123.123.123 |
Author |
Rating:
Not yet rated.
Demo Gorgon
|
Title |
Test
Details
Pattern Title
|
Expression |
(mailto\:|(news|(ht|f)tp(s?))\://)(([^[:space:]]+)|([^[:space:]]+)( #([^#]+)#)?) |
Description |
this is a very little regex for use within a content management software. links within textfields has not to be written in html. the editor of the cms is instructed to use it like this: 1. mention spaces in front and behind the url 2. start url with http://, mailto://, ftp:// ... 3. use optional linktext within #linktext# (separated with single space) 4. if there is no linktext the url/email will show up as linktext 5. avoid url with spaces in filename (use %20 urldecode) replace pattern (space in front): <a href="\\1\\3\\4" target="_blank">\\3\\6</a> |
Matches |
http://www.domain.com | http://www.domain.com/index%20page.htm #linktext# | mailto://user@domai |
Non-Matches |
<a href="http://www.domain.com">real html link</a> | http://www.without_space_ |
Author |
Rating:
Not yet rated.
Martin Schwedes
|
Title |
Test
Details
Pattern Title
|
Expression |
href[ ]*=[ ]*('|\")([^\"'])*('|\") |
Description |
the regex's on this site for pulling links off a page always seemed to be faulty, or at least never worked with PHP, so i made this one. simple, as i'm an amateur with regex's, but stumbled thru it and this one actually works. tested with PHP function: preg_match_all("/href[ ]*=[ ]*('|\")([^\"'])*('|\")/",$string,$matches) |
Matches |
href="index.php" | href = 'http://www.dailymedication.com' | href = "irc://irc.junk |
Non-Matches |
href=http://www.dailymedication.com |
Author |
Rating:
Jason Paschal
|
Title |
Test
Details
Pattern Title
|
Expression |
<\s*a\s[^>]*\bhref\s*=\s*
('(?<url>[^']*)'|""(?<url>[^""]*)""|(?<url>\S*))[^>]*>
(?<body>(.|\s)*?)<\s*/a\s*> |
Description |
Suitable for extraction of all hyperlinks in the format:
<a ... href="..." ...> some text </a>
from a text document. Separates in groups the components of the links (url and body). |
Matches |
<a href="javascript:'window.close()'">close the window</a> | <a target=&quo |
Non-Matches |
<aa href="test.htm">test</a> | < a href hr = 'http://www.nakov.com'>...& |
Author |
Rating:
Svetlin Nakov
|
Title |
Test
Details
Pattern Title
|
Expression |
<a\s*href=(.*?)[\s|>] |
Description |
Retrieves all anchor links in a html document, useful for spidering. You will need to do a replace of " and ' after the regular expression, as the expression gets all links. As far as I know there is no way, even with \1 groupings, of getting a condition on whether the link contains a ",' or nothing at all (" and ' is easy enough, but what happens if the link starts with ", and has a javascript function call with a string in it). If there is, it's probably quicker to do it like this and do a string replace anyway. |
Matches |
<a href="http://www.blah.com"> | <a href='../blah.html' target="_top"&a |
Non-Matches |
<a href = http://www.idiothtmlprogrammers.com > |
Author |
Rating:
chris s
|
Title |
Test
Details
Pattern Title
|
Expression |
<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a> |
Description |
This regex will extract the link and the link title for every a href in HTML source. Useful for crawling sites.
Note that this pattern will also allow for links that are spread over multiple lines. |
Matches |
<a href='http://www.regexlib.com'>Text</a> | <a href="...">Text</a> |
Non-Matches |
all other html tags |
Author |
Rating:
Not yet rated.
Jacek Sompel
|
Title |
Test
Details
Pattern Title
|
Expression |
<a\s*.*?href\s*=\s*['"](?!http:\/\/).*?>(.*?)<\/a> |
Description |
Finds all local links, but doesnt match on external links.
Use replace with $1 to leave the link text but remove the link. |
Matches |
<a href='locallink.htm'>my local link</a> | <a title='click here' href="/a/local |
Non-Matches |
<a href='http://www.site.com/page.htm'>www.site.com</a> | <a href='http://www.site.co |
Author |
Rating:
Not yet rated.
james mountain
|
Title |
Test
Details
Pattern Title
|
Expression |
href\s*=\s*(?:(?:\"(?<url>[^\"]*)\")|(?<url>[^\s*] ))>(?<title>[^<]+)</\w> |
Description |
finds the url and url description for all links in a given text. |
Matches |
<td bgcolor="#ffffff" class="small">&nbsp;<A HREF=" http:// |
Non-Matches |
<td bgcolor="#ffffff" class="small">&nbsp;<A HREF http://www.thepla |
Author |
Rating:
Not yet rated.
Matt Bruce
|
Title |
Test
Details
Pattern Title
|
Expression |
((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)? |
Description |
This RE matches the web links which begin http://, ftp://, https:// or www.
You can edit this disadvantage easy... |
Matches |
www.diskusneforum.sk | http://diskusneforum.sk | ftp://23.45.267.189/ |
Non-Matches |
diskusneforum.sk | localhost |
Author |
Rating:
Not yet rated.
Martin Ille
|
Title |
Test
Details
Pattern Title
|
Expression |
<a[a-zA-Z0-9 ="'.?_/]*(href\s*=\s*){1}[a-zA-Z0-9 ="'.?_/]*\s*((/>)|(>[a-zA-Z0-9 ="'<>.?_/]*</a>)) |
Description |
An expression that matches all XHTML valid hrefs (links). It even alows spaces like href = "href...", dough this is not quite XHTML valid. It finds only hrefs but not for instance anchors. If you need to find only anchors, replace "href" within expression with "name" and thats it. |
Matches |
<a href="www.google.com">Google</a> | <a href=www.google.com /> | <a |
Non-Matches |
<a name="anchor">Anchor</a> | <img src="image.gif"> |
Author |
Rating:
Aleš Potocnik
|
Title |
Test
Details
Pattern Title
|
Expression |
<a [a-zA-Z0-9 ="'.:;?]*href=*[a-zA-Z0-9 ="'.:;>?]*[^>]*>([a-zA-Z0-9 ="'.:;>?]*[^<]*<)\s*/a\s*> |
Description |
you can find all the hyperlinks with their caption and attributes. in other words you can find anchors with their attributes and label or value. |
Matches |
<a href=/language_tools?hl=en>Language Tools</a> | <a href="http://www.google.co |
Non-Matches |
<a name="Lucky">Lucky</a> |
Author |
Rating:
Not yet rated.
himraj love
|
Title |
Test
Details
Pattern Title
|
Expression |
target[ ]*[=]([ ]*)(["]|['])*([_])*([A-Za-z0-9])+(["])* |
Description |
Matches the HTML "target" attribute. I had an editor that edited pages but whe wysiwyg editor would break on link that had a target to say "_top" or another window. So I needed an expression to match the target attribute on links in HTML. |
Matches |
target = "_top" | target = _top | target = "foo" |
Non-Matches |
target foo | target "foo" | target = "" |
Author |
Rating:
Phil Cogbill
|
Title |
Test
Details
Pattern to find Anchor Tag in a web page
|
Expression |
<a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\']*.*?>([^<]+|.*?)?<\/a> |
Description |
This pattern is a slight modification in pattern submitted by Jacek Sompel. Using this tag one can also match anchor tags not having ' (single quote) or " (double quote) in href. This is useful for web crawler for crawling all links in a web page. |
Matches |
<a href='http://www.regexlib.com'>Text</a> | <a href="...">Text</a> | <a href=http://www.regexlib.com>Text</a> |
Non-Matches |
all other html tags |
Author |
Rating:
Kuleen Upadhyaya
|
Title |
Test
Details
Relative paths in HTML
|
Expression |
(<(?:.*?)\s)href\s*=([\s"'])*/?([^\2:#]+?)\2((?:.*?)>) |
Description |
This expression matches all HREF relative paths, but not full URLs or dead # links. It can be used for selecting paths that need to be updated in HTML that has replaced from its original page onto a new one. It matches the entire containing tag with the following groups: 1 - the start of the containing tag through the space before the attribute, 2 - the delimiter between the attribute's equal sign and its value (e.g. a double quote), 3 - the attribute value, 4 - the remainder of the tag after the closing attribute value delimiter. |
Matches |
<a href="joe's test.htm" /> | <a href='/test2.htm'> | <a href = test2.htm /> |
Non-Matches |
<a href="http://www.test.com/test.htm" /> | <a href = '#' /> |
Author |
Rating:
Not yet rated.
Joe Theriault
|
Title |
Test
Details
Bugtraq logregex property for trac
|
Expression |
(refs|references|re|closes|closed|close|see|fixes|fixed|fix|addresses) #(\d+)(( and |, | & | )#(\d+))* |
Description |
This expression can be used to set the bugtraq:logrexep property of a subversion repository. It uses the format supported by trac and enables for example tortoisesvn to transform the issue numbers used in the commit messages into links pointing to the issue in the bugtracker. |
Matches |
fix #313 | references #1024 and #1337 |
Non-Matches |
fixed 313 | refer #1024 |
Author |
Rating:
Not yet rated.
Markus Peter
|
Title |
Test
Details
*.css without http
|
Expression |
(href=|url|import).*[\'"]([^(http:)].*css)[\'"] |
Description |
get all css links, tags ect without http, i needed this to my web crawler, maybe somebody need this to ;)_ |
Matches |
import url("some.css"), import("some.css"), <link rel="STYLESHEET" type="text/css" href="some.css"> |
Non-Matches |
import url("http://domain.com/some.css"), import("http://domain.com/some.css"), <link rel="STYLESHEET" type="text/css" href="http://domain.com/some.css"> |
Author |
Rating:
Krzysztof Chełchowski
|
Displaying page
of
pages;
Items to