[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
sort: mkstemp() runs out letters for temporary files
From: |
Bob Proulx |
Subject: |
sort: mkstemp() runs out letters for temporary files |
Date: |
Fri, 26 Oct 2001 01:32:18 -0600 |
I have run into a problem with sort not related to the collating
sequence. :-) It turns out I found two things. This is the first
part. Please fasten your seatbelt and hang on, I have a long message.
Sorry about that but I had a lot to say. Please trim appropriately
for any replies.
While sorting a moderately big file on HP-UX 10.20 of around 200MB,
the GNU textutils-2.0.14 sort needed to create temporary files for
later merging. The program died with a rather confusing error message
to the user. Note that '{' follows 'z'.
/tmp/sort{12345: No such file or directory
In sort.c it uses create_temp_file() it sets up a temp_dir[] and then
adds /sortXXXXXX to it in traditional fashion to yield /tmp/sortXXXXXX
for mkstemp("/tmp/sortXXXXXX"). All fine. More or less.
It seems that traditionally mkstemp() uses the process id and prepends
a letter to yield something a temporary filename like /tmp/sorta12345.
This is the algorithm on HP-UX. But having only one letter limits the
number of temporary file names that this algorithm can generate quite
severely. It is limited to 26 files! In fact the man page says the
following which I have trimmed:
NAME
mktemp(), mkstemp() - make a unique file name
...
Remarks:
These functions are provided solely for backward compatibility and
importability of applications, and are not recommended for new
applications where portability is important. For portable
applications, use tmpfile() instead (see tmpfile(3S)).
...
RETURN VALUE
mktemp() returns its argument except when it runs out of letters, in
which case the result is a pointer to the empty string "".
mkstemp() returns an open file descriptor upon successful completion,
or -1 if no suitable file could be created.
...
WARNINGS
It is possible to run out of letters.
...
STANDARDS CONFORMANCE
mktemp(): SVID2, SVID3, XPG2
[Actually it does not return the empty string, it returns "sort{12345"
which I will report as a bug. But at that point, who cares?]
And checking the online docs at single_unix_specification_v2 I see:
"It is possible to run out of letters.
For portability with previous versions of this document, tmpfile() is
preferred over this function."
So it seems that regardless of the convenience a standard conforming
implementation of mkstemp() can run out of letters! I can't complain
that the library routine should be fixed since it legally isn't broken
according to the standards. The standard does not say how many
temporary files must be provided, just that a limit may exist. And if
so then the code should operate within those limits. Although
unknown.
I assume that the mkstemp() author was working on an old Version 6 or
Version 7 system which only allowed 20 files open as a maximum and so
providing 26 was more than the total number the system allowed and
gave headroom besides. I can't recall when systems went from 20 to 60
files maximum. Now it is either 1024 or indefinite on most modern
systems. Too bad the standard did not require mkstemp() to provide at
least that many files as well so that it could have kept up with the
times.
Of course nothing prevents implementations from increasing the number
of files that mkstemp() can provide. This is what glibc does.
TMP_MAX is 238328 there.
So what about tmpfile()? It uses a modern algorithm. But that is
hard to use in this case. Because sort needs to be able to merge
sorted files as part of the normal operation of the program it appears
inconvenient to redesign the program to use tmpfile(), which returns
FILE*s to unlinked files instead. There really is no equivalently
convenient functionality for this task. I would stick with a
mkstemp() like function interface.
Although a wrapper around the tmpnam() routine could do the job. It
would look similar to what is in the fallback code. At some point in
the future I might advocate switching to tmpnam() and will reserve
that opinion here. But for now I just want to use the provided
fallback code almost verbatim.
There is a fallback module in textutils just or this purpose. If a
mkstemp() had not been provided in the library at all then autoconf
would have determined that and the fallback routine would have done a
perfectly fine and portable job of this function. This implementation
appears to be the same as in glibc. Therefore in my patched version
of sort I have forced fallback to the textutils included lib/mkstemp.c
and lib/tempname.c functions in order to correct this problem of
running out of temporary files on hpux. This works fine.
It seems possible to always use the fallback code. But that does not
seem right on systems that provide an improved version in their libc.
Therefore I suggest the following. Provide a configure runtime test
to check if mkstemp() can provide a reasonable number of temporary
files. If so then go ahead and use it. But if not then #undef
HAVE_MKSTEMP in config.h instead of defining it as it does now. Here
is one possible test for configure.
#include <stdlib.h>
#include <stdio.h>
int main()
{
int i;
char buf[64];
for (i = 0; i < 30; ++i)
{
strcpy(buf,"/tmp/acmkstempXXXXXX");
if (mkstemp(buf) < 0)
exit(1);
}
return 0;
}
Note the number of files should be larger than 26. Should it test to
(60 - 3)? Or to some other limit? 2*26+1? I would guess the
portability problem trigger is 26 and you either have a 26 limited
routine or you don't. So any number past that would suffice.
But unlike Jim I am not a configure wizard and so I am not going to
attempt to suggest a complete solution here. I will leave that as an
exercise for the reader! But please do educate me on the proper way
to do this. Among my problems that I don't know how to solve are, how
do the temporary files created by this test get cleaned up by the
test? And what happens if someone is cross-compiling? [In the case
of cross compiling I would always fall back to the internal
implementation because it should be good to go and the target machine
can't be tested.]
Are there other ways to solve this problem?
Now for a problem I see in the implementation in tempname.c. There I
find the following code:
#include <stdio.h>
#ifndef P_tmpdir
# define P_tmpdir "/tmp"
#endif
#ifndef TMP_MAX
# define TMP_MAX 238328
#endif
The P_tmpdir is correct. The system should be allowed to define the
default temp directory. But the conditional definition of TMP_MAX I
believe is wrong since the included implementation is not limited to
the system's value of TMP_MAX. If the system defines that to be
small, say 26, then we are back to the same problem we were trying to
solve before!
In this case it doesn't. HP-UX defines it to be 17576 which is
probably fine for most conceivable practical tasks. But any given
system could define this to be arbitrarily small. Why artifically
limit the fallback code to the system limit when the whole reason the
fallback code is being used is that that system version is found
lacking? The TMP_MAX in tempname.c should be unconditionally #undef'd
and defined to be the value desired by the included implementation.
The included fallback implementation is not limited to 238328 files
and I don't know how that number was derived. I certainly would not
want that many files in one directory. Please spare me that sight.
But it is as good as any other number larger than a couple of
thousand. The fallback implementation should not use the system value
of TMP_MAX and should always defined a reasonable value itself.
Thanks
Bob
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- sort: mkstemp() runs out letters for temporary files,
Bob Proulx <=