Technical question
Aug. 14th, 2008 10:05Is there a regular expression that would allow me to replace any character above ASCII-128 in a string with the HTML entity equiv, an in:
replace in {string}: ascii-X with &#X; where X between 128 and 255
It would save a nested loop if I could figure out how to do it in one statement.
replace in {string}: ascii-X with &#X; where X between 128 and 255
It would save a nested loop if I could figure out how to do it in one statement.
no subject
Date: 2008-08-14 18:00 (UTC)$source =~ tr/[\xxx]/rvalue/g;
(in Java or Perl) Where...
tr = translate command
\xxx = the octal value for the thing you're looking for
rvalue = the replacement string
g = do it globally in the source string
Technically not a regex, but similar.
The downsides are that you have to know the octal for what you're looking for and you have to have one of those translations for each potential target.
The other downside is that high-order ascii (128-255) is OS/hardware implementation dependent and several things that are handled in some implementations are actually only universally available in Unicode (UTF-8, UTF-16, and the like.) The various ISO charsets (ISO-8859-1 Latin, for example) are spotty in their implementations. An example would be the trademark character (tm in a circle) -- It's a 2-byte character in unicode represented by 8482 hex. In that case, character scanning would only work if you're OS/language/source material was unicode encoded so that each "character" would be a multi-byte entity.
- Jeho