groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: stpecpy(): A better string copy function


From: Alejandro Colomar (man-pages)
Subject: Re: Fwd: Re: stpecpy(): A better string copy function
Date: Sun, 13 Feb 2022 19:29:37 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1

Hi, Branden and Martin!



On 2/13/22 01:40, G. Branden Robinson wrote:
> At 2022-02-13T01:05:13+0100, Alejandro Colomar (man-pages) wrote:
>> I designed some string copying function that attempts to improve
>> strecopy(), and of course the common/standard ones, including
>> strlcpy(3BSD) and strscpy(9), ....
>
> Oh, I was going to ask if you were aware of stpcpy(), but if I click the
> link to codidact I see that you are.
>
> I expect/hope stpcpy to become the new norm for string copying, though
> it will require overcoming much inertia and many dusty old books.
>
> It was introduced to POSIX in Issue 7 (2018).
>
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/strcpy.html
>
> Martin Sebor is sponsoring its inclusion in C2x.
>
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2352.htm
>
> (It may have been accepted, or not--I haven't checked the status.)

No, stpcpy(3) was not accepted.  memccpy(3) was instead.  The problem
wasn't stpcpy(3) as it seems, but stpncpy(3) about which I'll rant a bit
below :).

>
> Since stpcpy() resolves the issue that Mr. Chu was (rightfully)
> aggrieved about, it might be regarded as "good enough" by most
> conscientious practitioners.  (Unconscientious ones are plentiful, but I
> don't expect anyone to pry them away from strcpy() and strcat().)

Not really.  It resolves one of the issues, but not all of them. There's
stpcpy(3), and stpncpy(3).

stpcpy(3):

It chains well, as strecopy() does, but it goes backwards in the sense
of strlcpy(3BSD) and strscpy(9).  There's no way to check bounds; and by
chaining copy functions, which this function is all about, it's pretty
hard to guarantee that the resulting string will be within the buffer
size.  There will be cases, and stpcpy(3) is probably a good thing to
standardize for those cases.  Basically, stpcpy(3) is better than
strcpy(3), but it's orthogonal to strlcpy(3BSD) and strscpy(9), which
are all about bounds checking.

Which brings us to stpncpy(3).  I'll reorder your mail a bit to add here
the discussion about stpncpy(3).

> My _guess_ is that people concerned about bounds issues are simply going
> to use stpncpy(), ensuring control of the amount of copying, and not
> relying on--or risking--the intervention of the runtime or hardware to
> catch traversal outside the bounds of a memory arena.

stpncpy(3):

This function should never have existed... with that name.  Its _not_ a
string function.  It's a character array function, as strncpy(3) is.
IMO, they should be named something like chrcpy() and chpcpy(), and
their current names should be obsoleted.  They copy from one character
array into another character array; neither `src` nor `dst` are strings,
and they have been misused in many projects.  Basically, strncpy(3) and
stpncpy(3) should be murdered in cold blood tomorrow.  That would end
with people misusing them for what they are not designed.

And this leaves us with no function for copying strings with bounds
checks and easy chaining.  So, the best things out there are
strlcpy(3BSD) (designed to copy from string to string) and strscpy(9)
(designed to copy from character array to string).  stpecpy() would
replace strlcpy(3BSD), and a hypothetic and similar stpsecpy() with an
extra size parameter would replace strscpy(9).

BTW, this makes me think that in the codidact post I need to rename
stpencpy() to stpsecpy(), to differentiate it from strncpy(3) and
stpncpy(3).


>
>> char *
>> stpecpy(char *dst, char *src, char *end)
>> {
>>      for (/* void */; dst <= end; dst++) {
>
> Right away we see an assumption that the `dst` and `end` pointers are
> comparable objects.  I'm teasing you a little, as there is no way in C
> to express an assertion that they must be; it is up to the C language
> runtime or the hardware to trap if this assumption does not hold.

That assumption is similar to the assumption that pointers should not be
NULL.  Users should read the manual, and use the function accordingly.
It's a case of garbage-in/garage-out if people pass random pointers,
IMO.  It would be ideal if the compiler could warn about that, but it can't.

I guess it's kind of like C++'s
vect2.assign(vect1.begin(), vect1.end());

I guess C++ neither has a static assertion that both iterators are
derived from the same object.  Or maybe it does...  but C for sure has
no way to assert that.  Maybe GCC could add some attribute opposite to
restrict...  I don't know.  But I'd like to think that people are not
going to pass random pointers to this function.

>
> But, _strictly_, you can't just go comparing pointers unless you know
> they originate from the same underlying allocation region, however that
> happens to be defined.
>
> https://pvs-studio.com/en/blog/posts/cpp/0576/

I should have added 'restrict' to 'src', as Lundin pointed out in
codidact.  But yes, current language doesn't have a better way to
express that both pointers shall be derived from the same object.  A
user of this function should RTM :).

>
> I suspect, while admittedly having no evidence to offer, that this is
> what has driven what success stpcpy() has had to date, and predict that
> this is whence resistance to your stpecpy() proposal will arise, if any
> does.
>
>>              *dst = *src++;
>>              if (!*dst)
>
> As I've noted elsewhere (can't remember if it's where you might have
> seen it), I dislike punning pointers to Booleans.  But this is a matter
> of style, and as far as I know nothing can go wrong with it.

I wasn't punning the pointer to bool (which I also do a lot), but the
character.  An equivalent exanded version would be `if (*dst == '\0')`.

Someone also misread it one codidact, so I guess people are not used to
this syntax very much :p

I caught it from the Linux kernel and git coding styles, which I tend to
like.  See
<https://git.kernel.org/pub/scm/git/git.git/tree/Documentation/CodingGuidelines#n247>:

^ - Do not explicitly compare an integral value with constant 0 or '\0',
^   or a pointer value with constant NULL.  For instance, to validate that
^   counted array <ptr, cnt> is initialized but has no elements, write:
^
^       if (!ptr || cnt)
^               BUG("empty array expected");
^
^   and not:
^
^       if (ptr == NULL || cnt != 0);
^               BUG("empty array expected");


>
>>                      return dst;
>>      }
>>      /* truncation detected */
>>      *end = '\0';
>>      return dst;
>> }
>
> Your logic seems to sound to me.  Only your assumptions give me
> pause--and they are assumptions that most C programmers have most of the
> time.  But see Yodaiken, "How ISO C became unusable for operating
> systems development", PLOS '21[1].


My opinion about that paper is that K&R C was far from perfect, and ISO
C is much more close to perfection than K&R C ever was (but I was
already good enough to base ISO C on it).  It's not ISO C that is not
good to write operating systems.  Yes, some extensions are needed, and
the language expects good compilers to provide those extensions as QoI,
but most of ISO C is good for systems; you just need to use the new
idioms.  What is true is that old-style C compiled as ISO C is not good
for systems anymore.  Old programmers need to understand the limitations
of the old language and use new idioms (or use compiler-specific flags
such as -fno-strict-aliasing to avoid ISO C behavior and compile
old-style C as old-style C).

While there are some things that ISO C should add, those are compiler
builtins such as __builtin_types_compatible_p(), or attributes, but the
core language is perfectly usable as is.  Want aliasing?  You have
unions and char pointers; no need to make the compiler go nuts.

>
> My _guess_ is that people concerned about bounds issues are simply going
> to use stpncpy(), ensuring control of the amount of copying, and not
> relying on--or risking--the intervention of the runtime or hardware to
> catch traversal outside the bounds of a memory arena.
>
> I hope this helps, and please feel free to quote me elsewhere,
> especially if you want to crash me up against real C experts and see how
> well I fare.  ;-)

Well, I did :)
I added Martin to this thread; let's see what he thinks about it.

Thanks for the review!!

Cheers,

Alex

>
> [1]
https://www.yodaiken.com/2021/10/06/plos-2021-paper-how-iso-c-became-unusable-for-operating-system-development/


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]