make-w32
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] Windows jobserver updates for GNU make 4.1


From: Troy Runkel
Subject: RE: [PATCH] Windows jobserver updates for GNU make 4.1
Date: Tue, 1 Mar 2016 12:21:38 +0000

Hi Eli,

The changes for this patch were created by another developer where I work.  
I've forwarded your questions to him and included his answers below.  Thanks!

--Troy Runkel

-----Original Message-----
From: Eli Zaretskii [mailto:address@hidden
Sent: Saturday, February 27, 2016 4:01 AM
To: Troy Runkel <address@hidden>
Cc: address@hidden
Subject: Re: [PATCH] Windows jobserver updates for GNU make 4.1

> From: Troy Runkel <address@hidden>
> Date: Wed, 24 Feb 2016 17:27:57 +0000
> Cc: Troy Runkel <address@hidden>
> 
> This patch expands the maximum number of simultaneous jobs on Windows 
> from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.
> 
> It also fixes a situation where GNU make on Windows could deadlock 
> and/or suffer memory corruption issues if –j or –j63 was used. The problem 
> was due to the way that $(shell …) commands are handled.

Thanks.

I wonder how hard would it be to remove the limitation altogether?
If we are lifting the limitation, why not lift it entirely?

        ANSWER:  Simplicity and reliability.  To our knowledge, no one has 
complained about the
        previous limit of 63 until we bumped into it while using IncrediBuild 
to distribute jobs across
        multiple machines.   A larger number could certainly be used or a 
dynamic value, but
        neither seemed appropriate in our engineering judgement.

Another comment is about the busy-wait loop (with 10-msec sleep) that this 
change uses -- it looks like, when there are 4095 jobs running, we will be 
re-checking each process once every 640 msec, is that right? 

        ANSWER: No, the check is constant time.  The 10ms sleep only occurs 
when no processes
        are ready and gmake would have otherwise been blocked (note:  the sleep 
is outside the
        inner for-loop.  It looks like the file was formatted with tabs so 
brace alignment might be
        confusing depending on what diff tool you are using.

That might be too large a delay, I think.  Did you consider a design that uses 
a separate thread for watching up to 64 processes?
  I think such a design might scale up better; in particular, the response time 
for checking the status of each job will not decrease as the number of jobs 
goes up.

        ANSWER: Yes, that was our first approach, but the cost of creating and 
managing the threads vastly
        exceeded the execution time of the current approach.  More 
specifically, when waiting on
        threads, one is typically waiting for them to die and when one dies, 
you have all the others
        to worry about rounding up (i.e., they need to be killed or stopped).  
It becomes hairy
        quite fast, and has all sorts of subtleties like race conditions, etc.

        Also, note that we were unable to measure the overhead of the current 
approach and it has been
        working reliably in a large production environment for 8+ months.       

> +DWORD GmakeWaitForMultipleObjects(
> +  DWORD nCount,
> +  const HANDLE *lpHandles,
> +  BOOL bWaitAll,
> +  DWORD dwMilliseconds
> +)
> +{
> +  assert(nCount <= GMAKE_MAXIMUM_WAIT_OBJECTS);
> +
> +  if (nCount <= MAXIMUM_WAIT_OBJECTS) {
> +    DWORD retVal =  WaitForMultipleObjects(nCount, lpHandles, bWaitAll, 
> dwMilliseconds);
> +    return (retVal == WAIT_TIMEOUT) ? GMAKE_WAIT_TIMEOUT : retVal;

This GMAKE_WAIT_TIMEOUT and GMAKE_WAIT_ABANDONED_0 stuff looks like a kludge to 
me; can we get rid of it?

        ANSWER:  No. These are to compensate for a short-sighted design 
decision of Microsoft's part to
        use values for error conditions that are now in the middle of our 
legitimate return range, thus
        we need to move the error conditions outside this now legal region.  
Keeping the error conditions
        the same and mapping the legitimate values around them, while probably 
possible, would
        likely require more code and greater chance of confusion and errors. 

> +        switch (retVal) {
> +          case WAIT_TIMEOUT:
> +            retVal = GMAKE_WAIT_TIMEOUT;
> +            continue;
> +            break;
> +          case WAIT_FAILED:
> +            fprintf(stderr,"WaitForMultipleOjbects failed waiting with error 
> %d\n", GetLastError());
> +            break;

I guess this fprintf should go away, or be converted to a DB call?

        ANSWER: We have never seen this error in practice (that we are aware 
of), but we wanted to
        cover all our bases.   Converting it to an alternate form is probably 
fine.  Thanks.

Thank you for working on this.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]