wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filename


From: Josef Möllers
Subject: Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375)
Date: Tue, 21 Aug 2018 08:06:42 +0000

Can I take this? I have already looked at it and found this:
>>>
The number of characters can be counted in C in a portable way using 
mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, 
as long as the appropriate locale has been selected. A hard-wired technique to 
count the number of characters in a UTF-8 string is to count all bytes except 
those in the range 0x80 – 0xBF, because these are just continuation bytes and 
not characters of their own. However, the need to count characters arises 
surprisingly rarely in applications.
[http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod]

keep in mind that counting codepoints will give the wrong answer if combining 
characters are involved; even normalizint the input won't help as there are 
graphemes which do not map to single codepoints...

[https://stackoverflow.com/questions/5117393/number-of-character-cells-used-by-string]
<<<

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/issues/375#note_95769646
You're receiving this email because of your account on gitlab.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]