discuss@lists.openscad.org

OpenSCAD general discussion Mailing-list

View all threads

Understanding the UTF-8 Lexer

N
NateTG
Fri, Jul 12, 2019 2:29 AM

I was looking at src/lexer.l and noticed
122:U      [\x80-\xbf]123:U2      [\xc2-\xdf]124:U3      [\xe0-\xef]125:U4
[\xf0-\xf4]126:UNICODE {U2}{U}|{U3}{U}{U}|{U4}{U}{U}{U}
I guess U+0080 through U+009F are control codes that are unlikely to occur
but, shouldn't U2 be [\xc0-\xdf]?

--
Sent from: http://forum.openscad.org/

I was looking at src/lexer.l and noticed 122:U [\x80-\xbf]123:U2 [\xc2-\xdf]124:U3 [\xe0-\xef]125:U4 [\xf0-\xf4]126:UNICODE {U2}{U}|{U3}{U}{U}|{U4}{U}{U}{U} I guess U+0080 through U+009F are control codes that are unlikely to occur but, shouldn't U2 be [\xc0-\xdf]? -- Sent from: http://forum.openscad.org/
TP
Torsten Paul
Sat, Jul 13, 2019 1:33 PM

On 12.07.19 04:29, NateTG wrote:

I guess U+0080 through U+009F are control codes
that are unlikely to occur but, shouldn't U2 be
[\xc0-\xdf]?

No, I don't think so. I guess the reason is that C0
and C1 would generate overlapping values with single
byte sequences.

https://www.fileformat.info/info/unicode/utf8.htm
also shows C2 to DF.

ciao,
Torsten.

On 12.07.19 04:29, NateTG wrote: > I guess U+0080 through U+009F are control codes > that are unlikely to occur but, shouldn't U2 be > [\xc0-\xdf]? No, I don't think so. I guess the reason is that C0 and C1 would generate overlapping values with single byte sequences. https://www.fileformat.info/info/unicode/utf8.htm also shows C2 to DF. ciao, Torsten.
N
NateTG
Sun, Jul 14, 2019 5:58 PM

Oh, I guess I misread the docs.  Thanks.

--
Sent from: http://forum.openscad.org/

Oh, I guess I misread the docs. Thanks. -- Sent from: http://forum.openscad.org/