bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug when field-separator used


From: Jeroen Hoek
Subject: Re: Possible bug when field-separator used
Date: Mon, 4 Dec 2023 13:31:45 +0100
User-agent: Mozilla Thunderbird

On Thu, Nov 30, 2023 at 08:49:44PM +0000, Tim Rice wrote:
I agree that when there are interactions between the locale and the field separator, the input should be considered faulty. In this case, do you think it would be sufficient to simply report an error and stop further processing?

I think there are three scenarios:

If (and only if) the chosen field separator matches LC_NUMERIC.decimal_point:

* Fail even before parsing anything, explain to user about locale issue and how to call datamash with LC_NUMERIC set.

Downside: annoying if the input doesn't even use decimal points. It is helpful in pointing out the datamash adheres to LC_NUMERIC though.

* If a line fails, explain to user like above.

Would only work if it is possible to detect the cause of the error. Telling the user that datamash may have failed because of the user's locale setting is useful though.

* Disable parsing of numeric points.

The last option probably causes the least amount of surprises. If a user specifies a field separator, they tend to know that it is suitable for the data they are inputting.

On 01-12-2023 09:16, Erik Auerswald wrote:
I had added a few failing tests, commented out, as documentation
for the issues, to the file "tests/datamash-tests-2.pl"
in commit 2b181b38e86f067fa68a7b174171a24e80313361
<https://git.savannah.gnu.org/gitweb/?p=datamash.git;a=blob;f=tests/datamash-tests-2.pl;h=3d09c19138f6a83d19074c4aa8c122964f4b24ea;hb=HEAD#l692>.

On Thu, Nov 30, 2023 at 08:49:44PM +0000, Tim Rice wrote:
If you have some cycles to contribute, it would help me get started if there 
was a test available. The tests are written in perl, so you may find it fairly 
easy to add a new one, by copying from another test that already exists. It 
would be great to see a patch which implements a test for whether GNU Datamash 
handles this faulty data correctly.

Looking at the disabled tests, ifdl12 and ifdl13 address this bug for the C (or en_*, etc.) locale.

Unfortunately to add the other relevant test cases (like commas), it would be necessary to somehow set the `decimal_point` value taken from LC_NUMERIC. I don't know how that can be done without depending on locales which might not be available for the build though.

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]