mapsedge: Me at Stone Bridge Coffee House (Titanic)
[personal profile] mapsedge
Is there a regular expression that would allow me to replace any character above ASCII-128 in a string with the HTML entity equiv, an in:

replace in {string}:  ascii-X with &#X; where X between 128 and 255

It would save a nested loop if I could figure out how to do it in one statement.

Date: 2008-08-14 18:00 (UTC)
From: [identity profile] jehosefatz.livejournal.com
Dunno. I'd probably should for someting like:

$source =~ tr/[\xxx]/rvalue/g;

(in Java or Perl) Where...

tr = translate command
\xxx = the octal value for the thing you're looking for
rvalue = the replacement string
g = do it globally in the source string

Technically not a regex, but similar.

The downsides are that you have to know the octal for what you're looking for and you have to have one of those translations for each potential target.

The other downside is that high-order ascii (128-255) is OS/hardware implementation dependent and several things that are handled in some implementations are actually only universally available in Unicode (UTF-8, UTF-16, and the like.) The various ISO charsets (ISO-8859-1 Latin, for example) are spotty in their implementations. An example would be the trademark character (tm in a circle) -- It's a 2-byte character in unicode represented by 8482 hex. In that case, character scanning would only work if you're OS/language/source material was unicode encoded so that each "character" would be a multi-byte entity.

- Jeho

June 2023

S M T W T F S
    123
45678910
11121314151617
1819 2021222324
252627282930 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 27th, 2026 22:03
Powered by Dreamwidth Studios