groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] NULL.3const: Add documentation for NULL


From: Alejandro Colomar
Subject: Re: [PATCH v3] NULL.3const: Add documentation for NULL
Date: Tue, 26 Jul 2022 14:02:56 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.0.2

Hi Ingo,

On 7/25/22 20:49, Ingo Schwarze wrote:
Hi Alejandro,

Alejandro Colomar wrote on Sun, Jul 24, 2022 at 09:19:32PM +0200:

- Move to man3const [Ralph, Branden]
- Added LIBRARY section
- Added #include [Ralph]
- Note that it can also be used as a function pointer [Ralph]
- Document that 0 is another null pointer constant [Ralph]
   But note that it's to be avoided by most coding standards [alx]
- Note that NULL is not NUL
- Improve wording about zeroing a pointer [Ralph]
   And refer to getaddrinfo(3) for an example.
   This probably can be further improved; I'm not convinced.
- Trim SEE ALSO to just void(3type)
- Other minor fixes

v3:

- Don't boldface 0s, since it doesn't refer to the literal constant 0,
   but to the bit pattern of 0s.
- Add list of headers that also define NULL (per POSIX.1-2008).


  man3const/NULL.3const | 80 +++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 80 insertions(+)
  create mode 100644 man3const/NULL.3const

diff --git a/man3const/NULL.3const b/man3const/NULL.3const
new file mode 100644
index 000000000..730f670fe
--- /dev/null
+++ b/man3const/NULL.3const
@@ -0,0 +1,80 @@
+.\" Copyright (c) 2022 by Alejandro Colomar <colomar.6.4.3@gmail.com>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\"
+.TH NULL 3const 2022-07-22 Linux "Linux Programmer's Manual"
+.SH NAME
+NULL \- null pointer constant
+.SH LIBRARY
+Standard C library
+.RI ( libc )
+.SH SYNOPSIS
+.nf
+.B #include <stddef.h>
+.PP
+.B "#define NULL  ((void *) 0)"
+.fi
+.SH DESCRIPTION
+.B NULL
+represents a null pointer constant.
+.PP
+According to POSIX,

That phrase is misplaced in the DESCRIPTION.
In belongs into STANDARDS.

I moved it to VERSIONS, since Jakub correctly pointed that this is something that was introduced in POSIX.1-2008.

I also considered removing it completely, though.


Not littering the DESCRIPTION with misplaced information is particular
important in the first two sentences, because that's the first point
of contact for the user where they are likely trying to figure out
what the basic idea of the thing is, and whether they are even looking
at the right manual page.

Agreed.


+it shall expand to an integer constant expression with the value

Using the word "shall" in a manual page is usually terrible style.
Here, it is misleading on top of that because it is unrelated to
anything the user might be expected to do or not do. >
A manual page is neither a standard document formally defining
the language nor a guideline for compiler authors.

Considering this sentence in isolation, what you want to say is:

   The macro
   .B NULL
   expands to the integer number 0 cast to the type
   .IR "void *" .

Makes sense; I reworded.


But you are violating an important guideline for writing manual
pages: avoid useless verbosity, don't say the same thing twice.
Here, you are saying exactly the same *three times*:  in the
SYNOPSIS, in the first sentence of the DESCRIPTION, and in the
second sentence of the DESCRIPTION.

Consequently, i suggest deleting the second sentence with no
replacement.

Yeah, I'm considering removing it. Let's keep it for v4, and maybe I'll remove it after.


+.B 0

There is really no need to mark up integer constants.

groff_man_style(7):
              Use  bold for literal portions of syntax synopses,
              for command‐line options in running text, and  for
              literals  that are major topics of the subject un‐
              der discussion; for example, this page  uses  bold
              for  macro,  string,  and  register  names.  In an
              .EX/.EE example of  interactive  I/O  (such  as  a
              shell session), set only user input in bold.

Since this is a literal that is a major topic of the subject under discussion, I prefer to be a bit pedantic here and boldface it.

I guess it's no big issue; does it hurt readability too much for you?



+cast to type

Bad grammar:  s/to type/to the type/.

Fixed.


+.IR "void *" .
+.PP
+A null pointer is one that doesn't point to a valid object or function.

That sounds like an afterthought, which is always bad in documentation.
If you think the first sentence is too vague, integrate this information
into the first sentence, where it obviously belongs.
Besides, this wording is misleading: it sounds as if NULL might be
pointing to an invalid object.

I guess what you mean is:

   .B NULL
   represents a null pointer constant, that is, a pointer that
   does not point to anything.

I prefer your wording.  Fixed.


+.SH CONFORMING TO

That should be ".SH STANDARDS".

We use CONFORMING TO in Linux.  Don't know why; just history, I guess.
See man-pages(7).


+C99 and later;
+POSIX.1-2001 and later.
+.SH NOTES

I throughly hate this section.  It is almost always an indication
that the organization of the page wasn't properly thought through
and random afterthoughts were dumped here.

Me too :p


+It is undefined behavior to dereference a null pointer

That is formally true, but hardly helpful in a manual page
because what happens when you dereference a NULL pointer is
faily predictable in practice: a segmentation fault.

Any other behaviour of the C language implementation would be such a
massive security risk that i don't think even the most avid compiler
optimizer would seriously consider it.  According to my practical
experience, NULL pointer accesses are among the most frequent bugs,
easily 20-50% of all bugs the can be found by fuzzing real-world code.
Having the C language do anything else than segfault, for example
continue execution with invalid or uninitialized data, would turn
huge numbers of fairly harmless bugs into potentially exploitable
security vulnerabilities.  Using my experience, off the top my head,
i would estimate that *not* segfaulting on NULL pointer accesses
would, in a typical codebase, increase the number of potentially
exploitable vulnerabilities by roughly one power of ten.

So, if you want to be helpful to the reader, you should say
something like:

   While dereferencing a NULL pointer is formally undefined
   behaviour, it almost certainly causes a segmentation fault
   in practice.

Oh, yeah, the act of dereferencing will very likely cause the process to segfault; it seems useful info to add. Thanks!

But also, the compiler can feel too smart about it, and silently remove other portions of code that it deems useless.

I reworded it to something similar to your proposal.


+or to perform pointer arithmetics on it.

Sure.

+.I NULL \- NULL
+is, surprisingly, undefined behavior, according to ISO C.

Saying that is completely useless.  Please delete this sentence.

On the one hand, it is irrelevant because nobody in their right
mind would do that in the first place.  A manual page is not
the place to enumerate weird and unusual ways how something
can *not* be used.  Its purpose is to explain proper usage.

Besides, this is already obvious without saying it; it is
not surprising in the least.  Pointer subtraction is only
defined for pointers *into the same object*, and you already
said that NULL pointers are not pointing to an object.

Heh!  Not so irrelevant IMO.  Or maybe I'm "not in my right mind" :P

<https://github.com/nginx/unit/commit/c3e40ae932f0>

I made use of that UB in the commit linked above.

BTW, the C++ standard seems to have considered it something worth defining, and as much as I usually hate C++, this is a case where I think they did it better.

I considered even mentioning that detail about C++ in the manual page, to let users know that even if it's UB, they can probably rely on the fact that `NULL - NULL == 0`.


+.PP
+.B 0
+also represents a null pointer constant
+when used in a context where a pointer is expected.

That is also completely irrelevant because you already told the
user about a better way to create a NULL pointer.  Why digress
into explaining another, no so good way?  On top of that,
taking the SYNOPSIS and standard type coercion rules together,
the content of this sentence is already blazingly obvious:
If an explicit cast (void *)0 yields a NULL pointer, then it
is clear the the implicit coercion "void *p = 0;" can only
have the same result.

Hmm, yeah, maybe it's obvious in POSIX, and so I consider removing it.

But consider the broader ISO C scenario (or old POSIX.1-2001), where NULL isn't defined to be (void*)0. There it isn't so obvious that 0 would also represent a null pointer. But, yeah, I'm leaning towards ignoring ISO C when (current) POSIX is much better, and in this case, every Unix implementation uses (void*)0 (or at the very least 0), AFAIK.


+This is considered bad practise by most coding guidelines,
+since it can be very confusing,
+and so
+.B NULL
+is preferred.

You do have a minor point here, but all the same, i doubt that
coding style belongs in a manual page, except when it has
security implications.  In that case, it might be appropriate
for the CAVEATS section.  If you think this matters for security,
you might for example say:

I still may add a few lines in CAVEATS, yes.


   .SH CAVEATS
   To avoid confusing human readers of the code, do not compare
   pointer variables to 0 and do not assign 0 to them.
   Instead, always use
   .BR NULL .

+.PP
+.B NULL
+shouldn't be confused with
+.BR NUL ,
+which is an
+.BR ascii (7)
+character,
+represented in C as
+.BR \(aq\e0\(aq .

That definitely belongs in the CAVEATS section.

Okay.


+.PP
+The following headers also provide
+.BR NULL :
+.IR <locale.h> ,
+.IR <stdio.h> ,
+.IR <stdlib.h> ,
+.IR <string.h> ,
+.IR <time.h> ,
+.IR <unistd.h> ,
+and
+.IR <wchar.h> .

If you think that matters, just put it at the end of the DESCRIPTION,
which is very short anyway.

Okay.

I think it matters (it matters to me, and to iwyu(1)).


But usually, i see no need for dwelling on such technicalities.
No harm is done if inexperienced users include <stddef.h>, too,
and experienced users can look up such technicalities in the standard
if they really need them, which happens rarely even for professionals.

I admit the audience of that info is rather small (and I'm working on iwyu(1) to make it even smaller), but I need the info to implement iwyu(1) correctly, and I'm not going to consult the standards all the time for every symbol that I want to know where it's defined.


+.SH BUGS

The following is misplaced in BUGS.  It is not talking about any bug,
nor about any API design defect.

Oh, I do consider this a bug in the API design. I placed it there on purpose.

Allowing the bit pattern of all 0s to represent a valid pointer (and thus not a null pointer) is something that could be meaningful many decades ago, when architectures were more different.

Nowadays, every arch out there is 2's complement, and uses 0s as the null pointer. The standard should simplify, and allow memset(2)ing pointers.

In fact, AFAIK, the next revision of POSIX will fix that bug. But I don't remember well the details of that.


If you consider memset of a struct containing pointer members
as unsafe, the section to say so would be CAVEATS because it
would then be considered as a user error, and CAVEATS is the
section to warn users about typical misuse of the API.

It seems in theory, you are correct that

   struct s {
        void *p;
   };
   s.p = 0;

is guaranteed to set p to NULL whereas

   memset(&s, 0, sizeof(s));

is not.  It's not hugely important though as i doubt that you will
be able to find a system in practice where the latter fails, and if

You're right.  All POSIX systems in existence de-facto allow memset(2).

a future platform would decide to let the latter fail, that would
break such huge amounts of existing code it would not really be
sustainable for the manufacturer of that hypothetical future
system.

+When it is necessary to set a pointer variable to a null pointer,
+it is not enough to use
+.BR memset (3)
+to zero the pointer
+(this is usually done when zeroing a struct that contains pointers),
+since ISO C and POSIX don't guarantee that a bit pattern of all 0s
+would represent a null pointer.
+Instead, pointer variables need to be explicitly set to a null pointer
+when they need to hold a null pointer.
+See the EXAMPLES section in
+.BR getaddrinfo (3)
+for an example program that does this.

That example can also be improved in two respects.

First, and less importantly,

   memset(&hints, 0, sizeof(hints));

is less error prone, more readable, and shorter than

   memset(&hints, 0, sizeof(struct addrinfo));

More importantly, it is not good style to first memset the struct
as a whole and then individual fields like

   hints.ai_flags = 0;
   hints.ai_protocol = 0;

to zero right afterwards.  You should either (preferably, because
that also clears any padding) memset and then only set the fields
you want to be non-0, or set all fields individually, but not both.

Yep.  I noticed when looking at it, but was a bit lazy to fix it :)

I'll come up with a new revision.  Thanks for the review!

Cheers,

Alex

--
Alejandro Colomar
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]