[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: make-3.79 on solaris8 broken
From: |
Howard Chu |
Subject: |
RE: make-3.79 on solaris8 broken |
Date: |
Mon, 19 Nov 2001 13:38:40 -0800 |
I've seen this kind of problem before in other programs, but usually only on
NFS-mounted filesystems. Generally on local UFS partitions the system calls
are atomic. It would be simpler if we could use sigaction() and set the
SA_RESTART flag for these signals, but the Solaris man pages don't mention
stat() as being one of the restartable system calls. (But I'd bet that it
is...)
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support
> -----Original Message-----
> From: address@hidden [mailto:address@hidden Behalf Of
> Kevin Nomura
> Sent: Monday, November 19, 2001 1:07 PM
> To: address@hidden
> Subject: make-3.79 on solaris8 broken
>
>
> Using make-3.79 under solaris 6 and solaris 8, I have been seeing
> two intermittent problems. It seems to get worse with higher values
> of -j. One is "No rule to make target xxx" when there is, in fact,
> a rule to make target xxx. As befits an intermittent problem, the
> make succeeds if rerun with no changes.
>
> The second problem is more insidious: make *quietly* fails to rebuild
> some of its targets that are out of date. The symptom is link errors
> with unsat symbols owing to the incomplete build. Again, rerunning
> make picks these up and succeeds. Since this is a chronic problem for
> us I spent this past weekend debugging it with make -d and have some
> theories to offer.
>
> The first problem seems due to the stat() in remake.c not being protected
> by a retry loop for EINTR. stat() on solaris is documented as failing
> with EINTR. So, I fixed this, actually implementing the "safe_stat()"
> function that has a prototype in make.h but no definition (!?). This
> cleared up the "No rule" errors but not the unsat link problems.
>
> For the second problem with failed links, the -d trace surrounding one of
> the files that should have been remade (but was not) looked like:
>
> Considering target file `../netcache/server/obj/td/wccp2.o'.
> Looking for an implicit rule for
> `../netcache/server/obj/td/wccp2.o'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/obj/td/wccp2.r'.
> Got a SIGCHLD; 1 unreaped children.
> Got a SIGCHLD; 2 unreaped children.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/obj/td/wccp2.f'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/wccp2.c'.
> Got a SIGCHLD; 3 unreaped children.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/wccp2.cpp'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/wccp2.c'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/wccp2.c'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite `../netcache/server/obj/td/wccp2.c'.
> Trying pattern rule with stem `wccp2'.
> Trying implicit prerequisite
> `../netcache/server/obj/td/wccp2.cc'.
> Trying pattern rule with stem `wccp2'.
> ...
> No implicit rule found for `../netcache/server/obj/td/wccp2.o'.
> ...
> No commands for `../netcache/server/obj/td/wccp2.o' and
> no prerequisites
> actually changed.
> No need to remake target `../netcache/server/obj/td/wccp2.o'.
>
> Seeing that a signal happened right about the time it was checking
> the prerequisite `../netcache/server/wccp2.c' (the source file, which
> does exist), I zeroed in on the readdir() in
> dir.c:dir_contents_file_exists_p().
> Now, readdir() is not documented in solaris 6 or solaris 8 to
> fail on EINTR.
> But I put in a retry loop anyway and CAUGHT readdir failing on
> EINTR, dozens
> of times in the build in fact. So with stat() and readdir() (and
> opendir()
> and some others for good measure) guarded by retry loops, the
> problems have
> now subsided.
>
> So assuming these are in fact the causes of the problems I saw, I am
> wondering whether solaris is in error for returning EINTR (e.g. is this
> broken with respect to POSIX or some standard that Solaris claims
> adherence to)? Should either or both of these be solved within make,
> at least as a practical issue?
>
> Kevin Nomura
> Network Appliance
>
> _______________________________________________
> Bug-make mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-make
>