Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filename

wget-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filename

From:	Josef Möllers
Subject:	Re: [Wget-dev] wget2 \| Progress bar: handle utf-8 characters in filenames (#375)
Date:	Tue, 21 Aug 2018 08:06:42 +0000

Can I take this? I have already looked at it and found this:
>>>
The number of characters can be counted in C in a portable way using 
mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, 
as long as the appropriate locale has been selected. A hard-wired technique to 
count the number of characters in a UTF-8 string is to count all bytes except 
those in the range 0x80 – 0xBF, because these are just continuation bytes and 
not characters of their own. However, the need to count characters arises 
surprisingly rarely in applications.
[http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod]

keep in mind that counting codepoints will give the wrong answer if combining 
characters are involved; even normalizint the input won't help as there are 
graphemes which do not map to single codepoints...

[https://stackoverflow.com/questions/5117393/number-of-character-cells-used-by-string]
<<<

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/issues/375#note_95769646
You're receiving this email because of your account on gitlab.com.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers <=
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Tim Rühsen, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Tim Rühsen, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Tim Rühsen, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Tim Rühsen, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers, 2018/08/21
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Josef Möllers, 2018/08/22
- Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375), Tim Rühsen, 2018/08/22

Prev by Date: Re: [Wget-dev] wget2 | GSoC'18: Support DNS over HTTPS - Discussions (#378)
Next by Date: Re: [Wget-dev] wget2 | Documenting 'static inline' functions from wget.h (#396)
Previous by thread: [Wget-dev] wget2 | Documenting 'static inline' functions from wget.h (#396)
Next by thread: Re: [Wget-dev] wget2 | Progress bar: handle utf-8 characters in filenames (#375)
Index(es):
- Date
- Thread