bug-standards
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Script to generate ChangeLogs automatically


From: Joseph Myers
Subject: Re: Script to generate ChangeLogs automatically
Date: Mon, 26 Nov 2018 23:09:48 +0000
User-agent: Alpine 2.21 (DEB 202 2017-01-01)

On Mon, 26 Nov 2018, Richard Stallman wrote:

>   > I think "in all cases" is not realistic; there are cases of structural 
>   > changes where any description in terms of named entities will be a mess, 
> 
> I doubt that.  Whatever the changes were, it is possible to list
> the entities that were changed, the entities that were deleted, and
> the entities that were added.

And when the actual change is a rearrangement of the contents of the file, 
or a rearrangement between multiple files, or a change to the surrounding 
#if conditionals rather than the individual functions, such a list is a 
mess and useless for actually understanding the change.  These are not 
rare kinds of changes; they are common in glibc.

Furthermore, you have unnamed entities - for example, in GCC machine 
descriptions, unnamed define_split constructs.  If you want to find past 
changes to such a construct, you have to use tools like "git blame", as 
there is no possible name to search for.

Furthermore, as I noted in January, there are cases in glibc where, while 
there is arguably a name, it's not a very helpful one - in makefiles, it 
can be something like

$(addprefix $(objpfx),$(filter-out $(tests-static) $(libm-vec-tests),$(tests)))

(being something that appeared on the left hand side of ':' in a makefile 
rule, where the change in question was modifying the name by changing 
$(libm-vec-tests) to $(libm-tests-vector)).  In the makefile it appeared 
with backslash-newlines in the name.  Someone is hardly able to search for 
such a name in a ChangeLog; they'd need to guess exactly how whitespace 
was inserted / removed for the line continuations in the Makefile, to 
produce the line-continued version in the ChangeLog, before they had 
something that would match the text in the ChangeLog.

It's these sorts of cases, where there is a mismatch between the nature of 
the change and the ChangeLog concept of changes that split into subchanges 
to well-defined entities with well-defined short names, where writing the 
ChangeLog entries can be the most work, *and* any ChangeLog entry is the 
least useful for understanding the change, *and* a script is going to make 
the most mess of describing the changes.  I don't think expecting scripts 
to do anything sensible in such cases is useful, because even a 
human-written ChangeLog entry is extremely unhelpful for understanding 
such changes.

>   > from the use of macros to 
>   > generate function definitions that makes it hard to identify relevant 
>   > entities
> 
> Indeed, the script to do this needs to be able to handle any nonstandard
> entity-defining constructs used in the package at hand.  But I expect
> that not to be very hard.
> 
> How many such constructs are used in glibc?  Could you post a list
> of what they are and what they look like?

For a package developed by many different people over 30 years, and with 
code taken from a range of third-party sources (BSD etc.), and various 
different languages in use, and about 17000 source files, naturally we 
can't identify all places with such peculiarities.  But for example you 
can have function names generated by macros, e.g.

FLOAT
INTERNAL (STRTOF) (const STRING_TYPE *nptr, STRING_TYPE **endptr, int group)

(you could say the function is INTERNAL (STRTOF)), or

CFLOAT
M_DECL_FUNC (__cacos) (CFLOAT x)

and then the same file as the latter has, after the function definition,

declare_mgen_alias (__cacos, cacos);

where the precise set of function aliases created by declare_mgen_alias 
depends on details of the glibc configuration, so it's hardly clear what 
entity name should be used for any change to the declare_mgen_alias call 
(or for calls to other such alias-creating macros).  Or in 
tst-strtod-nan-locale-main.c we have

#define TEST_STRTOD(FSUF, FTYPE, FTOSTR, LSUF, CSUF)                    \
static int                                                              \
test_strto ## FSUF (const char * loc, CHAR * s)                         \
{                                                                       \
  CHAR *ep;                                                             \
  FTYPE val = FNX (FSUF) (s, &ep);                                      \
  if (isnan (val) && *ep == 0)                                          \
    printf ("PASS: %s: " FNPFXS #FSUF " (" SFMT ")\n", loc, s);         \
  else                                                                  \
    {                                                                   \
      printf ("FAIL: %s: " FNPFXS #FSUF " (" SFMT ")\n", loc, s);       \
      return 1;                                                         \
    }                                                                   \
  return 0;                                                             \
}
GEN_TEST_STRTOD_FOREACH (TEST_STRTOD)

where GEN_TEST_STRTOD_FOREACH generates multiple calls to the macro whose 
name is passed as an argument, for different floating-point types - in 
this case, that means generating multiple function definitions.  If you 
change the TEST_STRTOD macro you can say TEST_STRTOD is the named entity 
changed - but if you change the call to GEN_TEST_STRTOD_FOREACH, it's much 
less clear how that relates to any one named entity.

In all of these cases, there is no obstacle to using "git blame", or "git 
log -L <start-regex>,<end-regex>" with appropriate regular expressions, to 
track changes to the code in question - whereas even if you invent an 
answer to what the canonical entity name should be in each of those cases, 
you can't expect subsequent readers to come up with the same entity name 
when looking for changes.  That's not a single command to search for 
changes to the entity, independent of what the entity is - you need to 
understand the git tools in question and select an appropriate command for 
the code you're looking at - but use of those tools is a much more 
reliable way of finding changes in such cases than attempting to generate 
a canonical entity name that can then be searched for in a ChangeLog 
(whether automatically generated or manually written).

If the entity name is a single C identifier, not generated through macros, 
it's clear enough what the name is and people might search for it.  If 
it's generated through macros, or some other construct as in the makefile 
example, or if the name involves qualification by a class or namespace 
name, or by argument types as in a C++ overloaded function, the right name 
becomes much less clear, and so searching by name becomes much less 
helpful.  (You could e.g. search by name and find changes in *all* of the 
many different overloaded functions with the same name but different 
argument types - or you could use "git blame" to look at just the changes 
to the particular implementation of interest.)

-- 
Joseph S. Myers
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]