coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: split: support unlimited number of split files


From: Jérémy Compostella
Subject: Re: split: support unlimited number of split files
Date: Sat, 10 Mar 2012 12:13:02 +0100

2012/3/10 Pádraig Brady <address@hidden>:
> On 03/01/2012 07:44 PM, Jérémy Compostella wrote:
>> Hi Pádraig,
>>
>> Finally, I found time to work again on this patch. I provided it as
>> attachment.
>>
>> I'm not completely satisfied with the documentation part. I've tried to
>> be more specific but it becomes quickly complicated. So I get back the
>> original explanation which does not completely satisfied me since it
>> does not explain the following point:
>> - When -a is not specified:
>>   - output file names are considered exhausted when the first suffix
>>     character will become 'z'.
>>   -'z[a-z]' suffixes are never used
>>   - the fact that suffix length is increased each time the suffix are
>>   "exhausted".
>> But maybe it's not relevant to be so specific.
>>
>> What is your opinion about this point ?
>
> Well documentation can easily include too much detail.
> But one should try to clarify any ambiguities
> a user might be considering.
>
> I adjusted that documentation as follows:
>
> @@ -2990,11 +2990,15 @@ The output files' names consist of @var{prefix} 
> (@samp{>
>  followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
>  default), such that concatenating the output files in traditional
>  sorted order by file name produces the original input file (except
> -@option{-nr/@var{n}}). If more than 676 output files are required and
> -@samp{-a} is not specified, @command{split} uses @samp{zaaa},
> -@samp{zaab}, etc.  If the @option{-a} is specified and the output file
> -names are exhausted, @command{split} reports an error without deleting
> -the output files that it did create.
> +@option{-nr/@var{n}}).  By default split will initially create files
> +with two generated suffix characters, and will increase this width by two
> +when the next most significant position reaches the last character.
> +(@samp{yz}, @samp{zaaa}, @dots{}).  In this way an arbitrary number
> +of output files are supported which sort as described above,
> +even in the presence of an @option{--additional-suffix} option.
> +If the @option{-a} option is specified and the output file names are 
> exhausted,
> +@command{split} reports an error without deleting the output files
> +that it did create.
I like it.

> I made a few adjustments to the code also.
>
> Disable this feature if --numeric=FROM is specified,
> because these suffixes are not all consecutive
> and hence we can't support arbitrary FROM values.
>
> Disable this feature if a specific number of
> files is specified with the -n option.
> We already auto select the width appropriately.
>
> I could find no logic flaws in the implementation in new_file_name().
> Well there was one nit with:
>
>  sufindex = xrealloc (sufindex, suffix_length * sizeof *sufindex);
>
> That should be as follows to avoid overflow issues:
>
>  sufindex = xnrealloc (sufindex, suffix_length, sizeof *sufindex);
Fair point.

> While looking at that I noticed a fair bit of overlap
> from the existing fixed width suffix initialization,
> so I merged the two with a goto.  That brought the
> new implementation down from 15 to 8 significant lines,
> and those 8 are now mostly simple integer width adjustments.
> Also by doing this we get to take advantage of the _PC_NAME_MAX
> check in this section.
I've been formatted by "Please, never use goto keyword in your C
program" at university and work too. Indeed, I've thought to a solution
like that but I didn't find a really smart solution without the goto
keyword.

> I also added a test.
>
> Full patch is attached.
> I hope to apply this sometime tomorrow.

BTW, reviewing your changes I discovered that when a user specifies a
suffix length zero the old behavior was to use the default length of 2
and the new is, of course, the suffix auto length way. However, IMHO,
it's dangerous to not raise an error. If a user for example make a
script which use split with a shell variable for the suffix_length
option it could have the feeling that his script works fine although it
provides a suffix length zero.

I attached an independent patch since there is no semantic relationship
with the current one. Feel free to merge, review or refuse it.

Cheers,

Jérémy

Attachment: 0001-split-suffix-length-zero-is-not-permitted.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]