mapsedge | Technical question

You're viewing

mapsedge's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

Is there a regular expression that would allow me to replace any character above ASCII-128 in a string with the HTML entity equiv, an in:

replace in {string}: ascii-X with &#X; where X between 128 and 255

It would save a nested loop if I could figure out how to do it in one statement.

Flat | Top-Level Comments Only

From:

akaashben.livejournal.com

I don't see how to do it without a loop, since you have to examine each character in turn to determine its ascii value.

Even if you could convert the entire string to its component ascii values before looking at it, you would still have to break it up in to 3-digit chunks and examine each one before converting it back. Running a regex search on 3-digit chunks of that would very likely mess it all up.

From:

joegoda.livejournal.com

Here's where I show my ignorance... but are you looking for something like this?

ascii_to_entities(str [string])

Converts ASCII to Entities

Class: Regex (REGX)

Description: Returns str after converting higher ASCII values into HTML entities where possible. Only use when the auto_convert_high_ascii config file preference is set to yes (i.e. $PREFS->ini('auto_convert_high_ascii') == 'y').

From:

rowangolightly.livejournal.com

Y'all are speakin' that foreign language again....

From:

billthetailor.livejournal.com

Well, as with any foreign language, it's best to start with a few basic phrases, such as:

010010000110010101101100011011000110111100101110

translation: Hello.

01010111011010000110010101110010011001010010000001101001011100110010000001110100011010000110010100100000011000100110000101110100011010000111001001101111011011110110110100111111

translation: Where is the bathroom?

0101011101101000011001010111001001100101001000000110100101110011001000000111010001101000011001010010000001101110011001010110000101110010011001010111001101110100001000000111001001100101011100110111010001100001011101010111001001100001011011100111010000111111

translation: where is the nearest restaurant?

I hope this handy guide will help you the next time you visit, and please enjoy your stay! :)

From:

rowangolightly.livejournal.com

::runs away to hide in my sewing room where it's safe::

From:

jehosefatz.livejournal.com

Dunno. I'd probably should for someting like:

$source =~ tr/[\xxx]/rvalue/g;

(in Java or Perl) Where...

tr = translate command
\xxx = the octal value for the thing you're looking for
rvalue = the replacement string
g = do it globally in the source string

Technically not a regex, but similar.

The downsides are that you have to know the octal for what you're looking for and you have to have one of those translations for each potential target.

The other downside is that high-order ascii (128-255) is OS/hardware implementation dependent and several things that are handled in some implementations are actually only universally available in Unicode (UTF-8, UTF-16, and the like.) The various ISO charsets (ISO-8859-1 Latin, for example) are spotty in their implementations. An example would be the trademark character (tm in a circle) -- It's a 2-byte character in unicode represented by 8482 hex. In that case, character scanning would only work if you're OS/language/source material was unicode encoded so that each "character" would be a multi-byte entity.

- Jeho

Flat | Top-Level Comments Only

Profile

mapsedge

Seamlyne Reproductions

Navigation

June 2023

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Page Summary

Style Credit

Style: Night Sea for Tranquility III by branchandroot

Expand Cut Tags

No cut tags

Page generated Feb. 28th, 2026 12:27

Ramblings, rumblings, grumblings, and rambling grumbly rumblings.

Conquering the internet, one cup of coffee at a time.

Technical question

Technical question

no subject

no subject

no subject

no subject

no subject

no subject

Profile

Navigation

June 2023

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags