[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filename
From: |
Bykov Aleksey |
Subject: |
Re: [Bug-wget] bad filename |
Date: |
Wed, 23 Apr 2014 16:57:11 +0300 |
Greetings, Darshit Shah
This was disscussed some (or long) time ago.
Possible logic:
If locale isn't UTF-8 then process as before else
1. Convert string to WideCharString with mbstowcs().
2. For Each WideChar check it size with wctomb(). If size is 1 then compare it
with mask. If char restricted, then "quoted++;"
3. If need, convert to lower/upper case with towlower()/towupper()
4. Recreate string char by char with wctomb: Convert char to temporary buffer.
If filechar size is 1 compare with mask and repalce. Else "memcpy(q,
char_buffer, char_size); q+=char_size;"
In windows i can't check it ( mbstowcs didn't work with UTF-8, so must be used
MultiByteToWideChar()...)
Patch for windows (unstructured, unclear, unfinished, but worked) is attached.
Best Regards, Bykov Aleksey.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
From: address@hidden
To:
Date: 13:59:43, 04.23.2014
Subject: Re: [Bug-wget] bad filename
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
>>On Tue, Apr 22, 2014 at 10:57 PM, Andries E. Brouwer
>> <address@hidden> wrote:
>> > If I ask wget to download the wikipedia page
>> >
>> > http://he.wikipedia.org/wiki/ש._שפרה
>> >
>> > then I hope for a resulting file ש._שפרה.
>> > Instead, wget gives me ש._שפר\327%94, where the \327
>> > is an unpronounceable byte that cannot be typed
>> > (This is an UTF-8 system and the filename
>> > that wget produces is not valid UTF-8.)
>> >
>> > Maybe it would be better if wget by default used the original filename.
>> > This name mangling is a vestige of old times, it seems to me.
>> >
>> > Andries
>> >
>>
>> This is a commonly reported grievance and as you correctly mention a
>> vestige of old times. With UTF-8 supported filesystems, Wget should
>> simply write the correct characters.
>>
>> I sincerely hope this issue is resolved as fast as possible, but I
>> know not how to. Those who understand i18n should work on this.
>>
>>
>> --
>> Thanking You,
>> Darshit Shah
>>
>>
>>
url_c.diff
Description: Binary data
- [Bug-wget] bad filename, Andries E. Brouwer, 2014/04/22
- Re: [Bug-wget] bad filename, Darshit Shah, 2014/04/23
- Re: [Bug-wget] bad filename, Andries E. Brouwer, 2014/04/23
- Re: [Bug-wget] bad filename, Tim Ruehsen, 2014/04/23
- Re: [Bug-wget] bad filename, Andries E. Brouwer, 2014/04/23
- Re: [Bug-wget] bad filename, Tim Ruehsen, 2014/04/24
- Re: [Bug-wget] bad filename, Andries E. Brouwer, 2014/04/25
- Re: [Bug-wget] bad filename, Tim Ruehsen, 2014/04/24
- Re: [Bug-wget] bad filename, Andries E. Brouwer, 2014/04/24
- Re: [Bug-wget] bad filename, Tim Rühsen, 2014/04/24
- Re: [Bug-wget] bad filename,
Bykov Aleksey <=