[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filename

From: Josef Möllers
Subject: Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375)
Date: Tue, 21 Aug 2018 08:06:42 +0000

Can I take this? I have already looked at it and found this:
The number of characters can be counted in C in a portable way using 
mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, 
as long as the appropriate locale has been selected. A hard-wired technique to 
count the number of characters in a UTF-8 string is to count all bytes except 
those in the range 0x80 – 0xBF, because these are just continuation bytes and 
not characters of their own. However, the need to count characters arises 
surprisingly rarely in applications.

keep in mind that counting codepoints will give the wrong answer if combining 
characters are involved; even normalizint the input won't help as there are 
graphemes which do not map to single codepoints...


Reply to this email directly or view it on GitLab: 
You're receiving this email because of your account on gitlab.com.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]