bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: getusershell: Split file by lines instead of spaces.


From: Collin Funk
Subject: Re: getusershell: Split file by lines instead of spaces.
Date: Tue, 21 May 2024 17:11:43 -0700
User-agent: Mozilla Thunderbird

On 5/21/24 5:32 AM, Bruno Haible wrote:
> * If the file ends with a non-empty line without a newline, getline()
>   returns a string that does not end in a newline.
>   Quoting 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html:
>    "including the delimiter character if one was encountered before EOF".

Oops, Thanks. I'm not sure why I was thinking that the line would
*always* end in the delimiter. Good catch.

> * The following code does not work with tokens that consist of a single
>   character:
> 
>           while (start < end && isspace ((unsigned char) *end))
> 
>           if (start < end && IS_ABSOLUTE_FILE_NAME (start))
> 
>   The condition "start < end" here tests whether the token has at least
>   two characters. Which is probably not what you intended. (In this case,
>   it is not fatal, since absolute file names of non-directories always
>   consist of at least 2 characters. But anyway.)

Actually, it was intentional. Perhaps I have a strange way of thinking
of things. That plus non-standard functions leave a lot of room for
"creativity". :)

I was thinking that "/" and "c:/" (on Windows) will never refer to an
actual shell. So no point in caring about it.

However with this in my /etc/shells:

================
/bin/bash
/
===============

When printing all shells glibc prints both lines. Mine only prints
"/bin/bash".

Strange function, in my opinion. However, lets trust the BSD and glibc
developers who wrote it over me. :) No need to change it's behavior.

>   The cause is that you are working with start and end both being
>   inclusive, that is, with a string of length end-start+1. There are 4
>   possible intervals for two variables start and length:
>     [start,end]
>     [start,end)
>     (start,end]
>     (start,end)
>   If you use sometimes [start,end], sometimes [start,end), you cannot
>   remember working idioms and you will regularly make mistakes.
>   The best way to avoid such mistakes is to work with [start,end)
>   intervals each time you work with pointers into arrays.
>   Then the condition "start < end" tests for a non-empty subsequence,
>   and you can remember and reuse idioms.

Interesting way to think about it, thanks. Do you have a strong math
background? It has been a while since I looked at that interval
notation.

Typically the way I think of things is using a "cursor", which I guess
is just a pointer to a position in a known data structure.

So in that patch we have something like this:

  start                    end
    ^                       ^
    |                       |
   [1] [2] [3] [4] [5] [6] [7] [8]
   '/' 'b' 'i' 'n' '/' 's' 'h' '\0'

Which would be [start,end] since *start and *end are part of the
string.

I've attached the patch which I think follows your suggestion of
[start, end):

  start                        end
    ^                           ^
    |                           |
   [1] [2] [3] [4] [5] [6] [7] [8]
   '/' 'b' 'i' 'n' '/' 's' 'h' '\0'

I guess that probably makes everything easier for others to follow. I
didn't think about it too much until you mentioned it.

Feel free to correct any misunderstandings I may have in that
explination + the patch.

Collin

Attachment: 0001-getusershell-Split-file-by-lines-instead-of-spaces.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]