[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: make cannot handle prerequisites that contain a colon
From: |
Paul D. Smith |
Subject: |
Re: make cannot handle prerequisites that contain a colon |
Date: |
Tue, 19 Oct 2004 15:31:31 -0400 |
%% Markus Kuhn <address@hidden> writes:
mk> Are colon and space really the only bytes that cannot be handled
mk> in a filename by make? How would you handle a pathologic filename
mk> that contains bytes such as 0xa0?
Well, filenames that contain newlines can't be represented either.
Make uses standard char* strings to hold all its data, not unsigned and
not wchar_t. But, it doesn't do much with these strings except chop
them up on whitespace, compare them, and send them to programs it
invokes.
So, offhand I can't see any reason why higher-bit set characters would
be a problem.
mk> So it would be good to have a brief section in the manual that
mk> explains, which characters are exactly allowed in file names to be
mk> processed by GNU make.
True, but that would require someone testing to see which ones work :-).
In other words, there's nothing in GNU make that actually looks at
target names to make sure they contain only a valid set of characters.
If there _ARE_ restrictions then they exist as a side-effect of
processing by some other area of GNU make.
mk> I guess, it would have to move from handling just strings to
mk> handling arrays of fully 8-bit transparent strings, more like what
mk> bash, tcl, or perl do.
Well, it's pretty complicated when you start looking at the details of
how such a thing could work, given the free-form syntax of makefiles.
>> Actually I had one idea that could be implemented without redoing all of
>> make's internals, but it would block off at least one and probably two
>> or more different 8-bit values from appearing in makefiles. In an i18n
>> world I don't know if this is acceptable.
mk> The i18n world is now fairly quickly moving towards using UTF-8,
mk> and UTF-8 strings have the useful property that the bytes 0xfe and
mk> 0xff are never used by the encoding. Other than that, using bytes
mk> in the 0x01-0x1f range may also be acceptable, because none of the
mk> ASCII-compatible character encodings used worldwide uses any of
mk> these to represent a graphicval character. (Well, there is VSCII-1
mk> in Vietnam, but hardly anyone really uses that under Unix, as it
mk> causes endless problems and has de-facto already been superceeded
mk> by UTF-8.)
That's good to know; one of the main problems I've had with trying to
find a solution to the issue is not having any personal understanding of
how the various extended character sets actually work and what would be
possible or not possible with them.
I'm not really sure what would be involved with providing support for
full UTF-8 character sets in make; I'd have to think about it more.
The idea I had involves changing escaped special characters like spaces
into "impossible" byte values in make's internal string representation.
That way all of make's current manipulation would continue to work
as-is: when searching for whitespace to break up words for example it
would not see the "impossible" byte values as whitespace, so it wouldn't
break on that character.
Then, at the very last minute before make invokes a commandline, etc. it
would un-translate the string and replace the impossible bytes with the
special characters again.
Of course, there are many details to work out, such as the user
interface for escaping special characters, exactly when the translation
back is done, etc.
--
-------------------------------------------------------------------------------
Paul D. Smith <address@hidden> Find some GNU make tips at:
http://www.gnu.org http://make.paulandlesley.org
"Please remain calm...I may be mad, but I am a professional." --Mad Scientist