config-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rethinking configuration tuples


From: Jacob Bachmeyer
Subject: Re: Rethinking configuration tuples
Date: Sun, 27 Aug 2023 00:06:48 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

John Ericson wrote:
On 8/24/23 23:54, Jacob Bachmeyer wrote:
John Ericson wrote:
This is why I opened with "Operating System" lacks a coherent 
objective definition.
[...]
As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems 
then in use across a wide variety of hardware.  The broader Free 
Software Movement unexpectedly shattered that state of affairs, 
leading to the 4-element configuration tuple form, when the Linux 
kernel became available and it was noticed that---oops!---GNU on 
Linux and GNU on HURD would have significant differences that at 
least some of the GNU packages would need to handle.  (For example, 
GNU libc is very different between Linux, where POSIX I/O maps fairly 
directly to underlying syscalls, and HURD, where POSIX I/O must be 
translated to Mach IPC, but both of these are Free GNU systems.)
This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).
Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as 
I see it portability and diversity of distros was pretty much 
inevitable --- replacing propriety Unix userlands with GNU software 
was a huge point in how GNU got going in academic/institutional 
environments in the early days, and even if Hurd got there before 
Linux there would be no reason to rip out that portability.
As I understand the history, Linux was the first clearly Free kernel 
available.  At the time, BSD still had a dark cloud hanging over it due 
to its (distant) origins at AT&T; the BSD and AT&T UNIX codebases would 
not be legally recognized as separate until February 1994, although BSD 
had honestly (almost?) completely diverged from the AT&T codebase in 
June 1991 with Net/2.  Mach was still proprietary; RMS was (or would 
later be) campaigning for its liberation, which would not occur until 
some years later.  It is worth noting that Linux was originally a toy 
kernel, and it only attracted the effort it did and grew like it did 
because it was basically the last missing piece for fully Free systems 
at the time.
JSON is pretty much a hard no for me: it is far too complex for what really needs to be a simple structure. Flat strings work very well for the way that GNU software typically expects to parse a configuration tuple using shell constructs. Perhaps it would be better to redefine configuration tuples as a flat list of tags with a canonical ordering? (The reason for a canonical ordering is in part to ensure that all existing coherent configuration tuple strings remain valid and to ensure that text-based pattern matching continues to work.)
Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.
JSON is far too complicated to use here, except possibly as a 
"pre-parsed" form that config.sub could output on request for programs 
that want a structured form instead of parsing the tuple themselves.  
But for that case, why use JSON instead of a trivial multi-line 
key=value format?
Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.

[...]
I know Po Lu doesn't like them, because they overlap with existing ones. But what about you two, Adam and Jacob? I am trying to compromise between what various things do already, and and also correct things like windows-gnu (even if there is no such thing as the GNU operating system (only multiple GNU Hurd-supporting distros), I agree that MinGW is clearly not a complete enough of set of GNU software to earn the right to drop the "minimal" part).
The logical problem with your parenthetical is that it ignores 
GNU/Linux, which *is* also a GNU system.
Hmm? I meant keep -gnu only for things which actually use GNU libc. Now I supposed something could use GNU libc but be really different in other ways from a real GNU system, but I am not really sure where to draw the line. There is a bury grey area of "use GNU libc but not sure if counts as GNU", no?
I argue for "duck-typing" here from the user's perspective:  if and only 
if the system in all meaningful ways appears to be the GNU system, there 
should be a *-gnu* somewhere in the configuration tuple.  This is the 
major expectation that using *-*-windows-gnu for MinGW violates:  GNU 
implements POSIX and MinGW does not.  Using *-mingnu still leaves 
considerable room for confusion in my view, which using *-mingw avoids.  
This is also the framework in which *-*-linux-gnu-musl makes sense for a 
system that uses Musl libc but is otherwise a GNU/Linux system.
Effectively, a different libc is a different ABI.  My larger goal here 
is to smooth the way for multi-arch systems, with 
/usr/CPU-VENDOR-KERNEL-OS-ABI or so as the --prefix for binaries built 
for each architecture.  This means that configuration tuples should be 
detailed enough to allow the needed distinctions, but not so detailed as 
to themselves become an artificial incompatibility.  In larger networked 
environments, even KERNEL and OS could vary.
I also quibble with CPU-VENDOR-linux-gnu and CPU-VENDOR-linux-musl. Android and GNU are different operating systems that both (can) use the Linux kernel, so I agree with CPU-VENDOR-linux-android for Android. The other two I see as: *-*-linux-gnu --- the GNU/Linux system, using GNU libc unless otherwise specified; *-*-linux-musl --- some unspecified Linux-based system using Musl libc, not necessarily using GNU.
With the proposed five-element form, the ambiguity is resolved:  
*-*-linux-gnu-musl --- a variant GNU/Linux system, using Musl libc.
Similar to the above, I know when something is/isn't using a specific 
libc, but any other distinction seems very blurry to me. See also what 
Connor wrote (perhaps more diplomatically than my "operating systems 
are inherently subjective!" bombast :))
Again, "duck typing"---if the system appears to be the GNU system, the 
tuple should contain *-gnu* somewhere.
If we can accept these, I think I will have no problem getting LLVM to accept windows-mingnu, and perhaps even warn/deprecate windows-gnu.
I still say this should be windows-mingw, but yes "windows-gnu" 
should definitely be deprecated, removed, and reserved in case 
someone actually ports a POSIX GNU environment to Windows.
Yeah whatever windows-something we settle on for MinGW, I promise my 
offer still stands to try to get get LLVM to (a) accept it, and (b) 
steer people away from windows-gnu towards it.
Thanks.

After that, I think we are close enough to convene a working group for a JSON/whatever explicit standard. And that would be amazing.
I still oppose JSON because it is way too verbose for this:  
configuration tuples need to be both expressive and simple enough to 
type at a shell prompt as arguments to configure.  Using JSON by 
default would also be a very nasty "flag day" that would break all 
existing programs that use config.sub.  Perhaps config.sub could 
accept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probably 
better for things like Autoconf which needs to parse the output of 
config.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanity elided*] 
to parse using shell constructs.  A flat list of hyphen-delimited tags 
is almost ideal for the parsing that configure needs to do.  In fact, 
with a few restrictions (met by using canonical ordering) this is what 
configure /already/ parses.
I am even OK if the dash-separating is a sort of "legacy mode" thing that remains ambiguous so long as one can always convert the unambiguous form (e.g. JSON) to it with much less logic than config.sub. (e.g. do all the work in "normalize to JSON", and making "JSON to old format" a very simple follow-up step.)
Note that config.sub is itself a shell script, and handling JSON in 
shell is a giant pain.  The most we could reasonably do is what 
config.sub already does:  determine each component as a separate 
variable and then output that by substituting text into a template.
The hyphen-separated form is unambiguous as it stands, or close enough 
to be resolvable with minimal effort.  With a dictionary of allowed 
element values, it is unambiguous, even if some elements are omitted; 
resolving ambiguous forms to unambiguous forms using such a dictionary 
is what config.sub /does/.
An alternate proposal hinted at above is to redefine configuation tuples as a flat tag list with canonical ordering. For example, a CPU type always comes first, but the rest is just a set of tags further describing the system, generally working from wide categories (like CPU architecture) to narrow categories (like choice of libc). A larger single installation could easily have some variety in the narrower categories; a network cluster running a single system image (which I understand is an eventual goal for HURD) could even have a variety of CPU types.
Yeah I think the "increasingly narrowing" way of thinking about it (almost like a converging sequence from Calculus) is very good.
Thank you; as I mentioned above, the goal is to best support 
heterogeneous multi-arch systems, but recognizing a tension here.  For 
configure, the configuration tuple should not contain information that 
can be determined by testing, but for storing multiple binary sets, ABIs 
do need to be part of the name, even if they can be determined by 
configure tests.
But to Adam's point, I think it is good that we recognize that while there are 5 tuples today (e.g. in the LLVM test suite), I don't think any of them do OS-LIBC; instead I things like aarch64-unknown-windows-gnu-elf. Not saying OS-LIBC is inherently bad (though I do have some reservations like Connor's), just that OS-LIBC is novel.
I called the fifth field "LIBCABI" because it can be a libc name or an 
ABI name; in practice the two are usually closely related.  Some 
existing tuples place a libc name in that slot, while others use a more 
generic ABI or file format name, such as "elf" in your example.  For it 
to be a source of confusion, there would need to be a libc that supports 
multiple ABIs, and you would simply use the ABI names in that case.

-- Jacob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]