[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: coreutils & building w/C++
From: |
L A Walsh |
Subject: |
Re: coreutils & building w/C++ |
Date: |
Sat, 04 Feb 2017 14:29:40 -0800 |
User-agent: |
Thunderbird |
Eric Blake wrote:
On 02/03/2017 07:31 PM, L A Walsh wrote:
I was wondering if there has ever been any consideration given to
migrating the coreutils to C++, or at least making it such that
it would build cleanly under either?
No, and there probably is no interest in it either. Finding
a standards-compliant C++ compiler on as many platforms as we
already have a working C compiler is unlikely to happen, but would
be a necessary prerequisite.
-----
Hmmm... Seems like the gnu C++ compiler shares alot of C's
infrastructure -- even the same man pages. It certainly seems
likely that if gcc runs on a given platform, g++ would as well.
Sorry, but I don't see any reason to rewrite in a different
language. Most of the core contributors are more familiar with
C than with C++,
-----
That's good since C++ is mostly a superset of C, so no rewriting
should be necessary. Usually, I find most standards-compliant
C programs will run with few changes. C++ does disallow some
dubious C constructs, but C++'s runtime includes all of C's library
functions. Given C++ was designed to be C compatible, I certainly
don't see any _need_ for rewriting unless one wants to refactor or
improve the code.
and so even if C++ were used, it would look a lot more like weird
C than it would like proper C++.
-----
Well, "proper" is usually an academic matter (unless it becomes
"vehemently proper", then religion may be involved). ;-)
Many of the C++ programs I've seen or worked with look like
standard C programs, with the users wanting specific C++ features,
like its standard library or allowing namespaces solely for the
purpose of keeping library or module calls in their own space and no
impinging on the global name space.
I also don't buy the argument that being more object-oriented will
help things; coreutils is not a large corpus of multiple
inter-related pieces, but a bunch of individual utilities that do
one thing.
----
Really? Looking at the source, I see 316 source files (.c or .h),
with 69,000 lines in a shared library, with only 30 files and 9000
lines of separate, tool-frontends in src. In fact, it seems that
all of the core utils can be built as 1 binary with all code shared
and individual tools being symlinks to the 1 binary. There looks to
be over 7x as much common code in the library as there is code
supporting the different tool interfaces.
Looks like Gcc and binutils are better projects for using C++,
and it shows, as both of them have already made progress towards
that front.
----
I don't know how long it's been going on, but it looks like it is
an ongoing project to find the commonalities in the core utils and
merge them with the common code going into the library.
There may be a score or more of individual tools, but many of them
handle the exact same or similar issues -- like file-tree traversal,
date+time manipulation+formatting, file access. With changes being
added to core utils to support various security needs, even more
commonality between tools is growing as they are all getting the
framework to deal with more specific end cases -- and that's stuff
that C++ can be very useful in organizing.
I was thinking of how much more flexible coreutils would be if it
was more organized like the linux-kernel, in so much that different,
orthogonal features can be developed in loadable modules. They
could conceivably be organized along the lines of the kernel's
security modules where only modules that are wanted/needed/used
would be loaded at runtime -- OR such modules can be hard-linked in
with the kernel at build time, either limiting loadables to lesser
used features, or none at all.
No, no one's ever been interested enough to even bother with it,
because it is probably not worth doing.
----
One could look at history to find the correlation between
something not being done and whether or not it was worth it. Most
dominant people didn't think it was worth trying to circumnavigate
the globe because it wasn't worth it to build a ship you knew would
fall off the edge of the flat world. Many thought flying was
impossible or couldn't be done, as was breaking the speed of sound.
History is filled with more examples of "undone things" being useful
when completed than not. Benefits of a particular course of action
are usually not visible before the action is done.
FWIW, I gave it a spin, and there seem to be several "__THROW"
keywords. I'm not familiar with those in the C-standard.
That's because they're not in the C standard. That particular
macro is defined by glibc:
misc/sys/cdefs.h:# define __THROW __attribute__
((__nothrow__ __LEAF))
as an optimization hint for gcc, and as a no-op for other
compilers.
-----
Wow... it was gcc that gave the errors (though w/different
switches than are normally used in build.
But this is all open source - if you are questioning what the code
means, rather than following the source and finding the answer
yourself, then you're already facing an uphill battle at trying to
rewrite the source.
-----
Especially in areas with non-standard API's or formats. In this
case, I was looking at the "low-hanging fruit" that gcc claimed it
didn't recognize that was responsible for multiple errors.
C++ has standard language features to control error propagation
and allow related optimizations. It's a good example of where C++
would be more useful in that it has standard methods and features in
areas where "ad-hoc" methods need to be used in C.
With C++, those features could be looked up in a C++ language or
library reference.
Perhaps it might, at least, be possible to use the C++ compiler as
type of diagnostic -- one that could help clarify existing C code
and make it more robust.
For example in C, one can initialize a character array like:
char b32str[32] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
used in "base32.c". However, depending on how it is used, it can
cause problems:
Compiled with:
> gcc -Wall -o b32 b32.c
-----
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char ** argv) {
static const char b32str[32] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
const int bufsiz = sizeof(b32str)*3/2;
char * buff = calloc(bufsiz, 1);
memset(buff, 'X', bufsiz-1);
strncpy(buff, b32str, sizeof(b32str) );
printf("Len of b32str, '%s', is %d characters.\n",
buff, (int) strlen(buff));
exit (0);
}
-----
When run, it produces:
Len of b32str, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567XXXXXXXXXXXXXXX', is 47
characters.
Under C, it produces no warnings as C allows the NUL terminator to
be dropped from the initialization if the initializer exactly fits --
a 'valid', but potential problem if the expression is treated as the
same type as the initializer (a string).
C++, generates an error:
> g++ -Wall -o b32 b32.c
b32.c: In function ‘int main(int, char**)’:
b32.c:7:33: error: initializer-string for array of chars is too long
[-fpermissive]
static const char b32str[32] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
^
b32.c:9:32: error: invalid conversion from ‘void*’ to ‘char*’ [-fpermissive]
char * buff = calloc(bufsiz, 1);
^
So, it seems it might be of some benefit in promoting safer programming
practices (or not, as people often find ways to work around restrictions!)
;-)