Title |
Test
Find
Enitity notation
|
Expression |
&
(?ni:\# # if a pound sign follow ampsand look for number
((x # if x follow pound sign accept hex value up to 5 digits
([\dA-F]){1,5}
)
| # otherwise accept decimal number between 0 - 1048575
(104857[0-5]
|10485[0-6]\d
|1048[0-4]\d\d
|104[0-7]\d{3}
|10[0-3]\d{4}
|0?\d{1,6})
)
| # no pound sign after ampersand
([A-Za-z\d.]{2,31}) #accept ASCII alphanumeric and period
); #end with semi-colon. |
Description |
This regex can be used to find general entites in HTML, XML and SGML files.
The entity can consist of
1) an ampsand (&)
2) followed by
(a) ASCII alphanumerics or period between 2 and 31 characters or
(b) a pound sign #
(i) followed by an x followed by a unicode value up to 5 hex digits or
(ii) followed by a decimal value from 0 to 1048575
3) ending with a semi-colon (;) |
Matches |
"e; | © | ' |
Non-Matches |
& | &#Hello; | &#Xray; |
Author |
Rating:
Michael Ash
|
Source |
|
Your Rating |
|
Title: Thank you!
Name: duncan
Date: 7/2/2004 6:38:40 AM
Comment:
Works fine for me - just what I needed! (Trying to get rid of pesky '&'s from a website, input by users, so that the HTML will pass validation...)
Title: Note
Name: Michael Ash
Date: 3/16/2004 5:26:34 PM
Comment:
While the unicode and decimal value are in the range of the current W3C values. No checking is done to ensure that a named entity (alphanumerics) maps to a known entity name.