groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mom : unicode in .INCLUDE'd files


From: Keith Marshall
Subject: Re: [Groff] mom : unicode in .INCLUDE'd files
Date: Sun, 23 Jul 2017 18:57:21 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 23/07/17 13:39, Ralph Corderoy wrote:
>> What's the rationale for choosing UTF-16 in the first place?
> 
> History.  Microsoft plumped for UCS-2, both UCS-2BE and UCS-2LE
> I think.

That's not as I recall it: UCS-2, yes, but always UCS-2LE, (since their 
focus was on Intel x86 -- a little-endian memory organization, so only 
little-endian UCS-2 would be useful for representation of wchar_t).

> That's a fixed width;  two bytes per rune.  When that became
> insufficient, UTF-16 was a backwards-compatible upgrade AIUI.

Yep.  Yet another of Bill's "no one will ever need more than 640kB of 
memory" moments, IIRC: "16-bits should be sufficient to represent any 
character which anyone will ever want to display".  Of course, he was 
wrong on both counts, and when they realized that 16-bits wasn't going 
to be enough, they changed their definition of Unicode[*] to represent 
UTF-16LE, and added support for surrogate pairs to the APIs.

[*]: When Microsoft documentation refers to "Unicode", they invariably 
mean UTF-16LE; they seem reluctant to as much as acknowledge that any 
other variant exists; (there are a few rare, hard to find, instances 
where UTF-7 or UTF-8 are mentioned ... and then, usually to caution 
against using them).

-- 
Regards,
Keith.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]