[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines |
Date: |
Fri, 01 Nov 2013 22:51:15 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 11/01/2013 06:53 PM, Pádraig Brady wrote:
> On 11/01/2013 06:20 PM, Eric Blake wrote:
>> On 11/01/2013 11:03 AM, Pádraig Brady wrote:
>>
>>>>
>>>> Escape the output (marking with a leading '\' and backslash-escaping
>>>> both '\' and '\n') only when the file name contains a newline.
>>>> Before, we would do that for a file name containing either newline or
>>>> backslash.
>>>>
>>>> This probably deserves a NEWS entry, since it is user-visible.
>>>
>>> I debated that as I thought it could have no impact on anything,
>>> but it could actually if one was comparing old and new outputs?
>>>
>>> newsum=$(md5sum my file set | md5sum)
>>> [ "$newsum" = "$(cat ./oldsum)" ] || error
>>
>> Not just that, but the new format is not necessarily parseable by older
>> md*sum. Your patch didn't show (but probably should be enhanced) what
>> happens for a file named 'a\nb'; pre-patch, it gave '\sum a\\nb',
>> post-patch it gives 'sum a\nb'
>
> Right.
>
>> - but if the older utility assumes that
>> the missing leading \ was a mistake and unescapes the file name, it
>> results in looking for a file as 3 three-byte name "a<newline>b", which
>> is also part of the user-visible change.
>
> Right but that's a big if.
> So you're referring to non GNU utils parsing these checksum files,
> and non honoring the leading \ escape marker.
> That's quite unlikely I would think.
>
>> Breaking output so that older versions can't parse newer output has been
>> one of the reasons that I have only threatened to patch \r handling,
>> rather than actually doing it, because it's tricky to think about
>> old/new interactions and what might break. Depending on how
>> conservative we are trying to be, we may need to add a command line
>> option that will let the user forcefully revert to the older-style
>> output for intentional interaction with older checksum tools regardless
>> of filename. For 99% of the cases, the output is identical, since files
>> with \n or \\ in the name are already rare. Thinking aloud, it may be
>> appropriate to have such a mode option be tri-state (old, new, or warn;
>> with default being warn), where the warning mode gives the new output
>> but ALSO flags to the user that their output may not be parseable by
>> older summing utilities.
>
> Well any change here isn't worth a flag I think.
> Even for \r one can always `tr -d '\r'` the DOS files before processing.
Or dos2unix to be careful to only process EOLs:
$ printf 'a\rb\r\r\n' | dos2unix | od -tx1
0000000 61 0d 62 0d 0a
> The only reason I was avoiding the redundant '\' escaping
> was to avoid having to do the unescaping like in cleanup_sum()
> here for example http://fslint.googlecode.com/svn/trunk/fslint/findup
> But I suppose even that's not general.
>
> OK I think it's not worth changing the output format now,
> given the possibility of non GNU tools parsing incorrectly,
> and the edge case where the output is directly compared
> to older output.
>
> I'll just do a maint commit to optimize/document at bit.
Pushed the non user visible adjustment at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=4d94e65
cheers,
Pádraig.
Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines, Eric Blake, 2013/11/01