Re: wait_reading_process_ouput hangs in certain cases (w/ patches)

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wait_reading_process_ouput hangs in certain cases (w/ patches)

From:	Matthias Dahl
Subject:	Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date:	Tue, 7 Aug 2018 10:38:02 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0
Hello Eli...

On 21/07/2018 11:52, Eli Zaretskii wrote:

>> diff --git a/src/gnutls.c b/src/gnutls.c
>> index d22d5d267c..5bf5ee0e5c 100644
>> --- a/src/gnutls.c
>> +++ b/src/gnutls.c
>> @@ -708,16 +708,18 @@ emacs_gnutls_read (struct Lisp_Process *proc, char 
>> *buf, ptrdiff_t nbyte)
>>    rtnval = gnutls_record_recv (state, buf, nbyte);
>>    if (rtnval >= 0)
>>      return rtnval;
>> -  else if (rtnval == GNUTLS_E_UNEXPECTED_PACKET_LENGTH)
>> -    /* The peer closed the connection. */
>> -    return 0;
> 
> Why is this hunk being deleted?  I see that you intend to handle it in
> emacs_gnutls_handle_error, but I'm not sure I understand the net
> result.  The current code simply ignores this situation; what will
> happen with the new code, except for the problem being logged?

The rationale behind this is:
On GnuTLS < 3, GNUTLS_E_UNEXPECTED_PACKET_LENGTH was returned for
premature connection termination. On >= 3, it is a real error on its
own and should be treated that way.

I moved the error handling over to emacs_gnutls_handle_error, so it's
at a central point and not scattered all across. Also makes the code
clearer to read, imho.

>>    else if (emacs_gnutls_handle_error (state, rtnval))
>> -    /* non-fatal error */
>> -    return -1;
>> -  else {
>> -    /* a fatal error occurred */
>> -    return 0;
>> -  }
>> +    {
>> +      /* non-fatal error */
>> +      errno = EAGAIN;
>> +      return -1;
>> +    }
>> +  else
>> +    {
>> +      /* a fatal error occurred */
>> +      errno = EPROTO;
>> +      return 0;
>> +    }
> 
> EPROTO is not universally available, AFAIK.  We have Gnulib's errno.h
> that defines a value for it, but that just gets us through
> compilation.  What do you expect this value to cause when it is
> encountered by some code which needs to interpret it?  We need to make
> sure the results will be sensible.

Sorry, I missed that. Right now, no code checks specifically for EPROTO,
so any other fatal value would do as well. But EPROTO is fitting rather
well in what is going on.

Any code that will get an errno = EPROTO will treat it as a fatal error
and that is exactly what it should do. So there is no problem with that.

The only problem that might arise is, if one of the platforms that do
not currently define EPROTO, define and use it. Then Emacs needs to be
recompiled for it to work properly again on that platform.

I don't know if Emacs uses other non-universally available errno values
as well. If it does, then, imho, it really does not matter here. If it
does not, I will change it to something else, if requested to avoid the
situation altogether. What do you think?

>> diff --git a/src/process.c b/src/process.c
>> index 6dba218c90..5702408985 100644
>> --- a/src/process.c
>> +++ b/src/process.c
>> @@ -5397,60 +5397,31 @@ wait_reading_process_output (intmax_t time_limit, 
>> int nsecs, int read_kbd,
>>  #endif      /* !HAVE_GLIB */
>>  
>>  #ifdef HAVE_GNUTLS
>> -          /* GnuTLS buffers data internally.  In lowat mode it leaves
>> -             some data in the TCP buffers so that select works, but
>> -             with custom pull/push functions we need to check if some
>> -             data is available in the buffers manually.  */
>> -          if (nfds == 0)
>> +          /* GnuTLS buffers data internally. select() will only report
>> +             available data for the underlying kernel sockets API, not
>> +             what has been buffered internally. As such, we need to loop
>> +             through all channels and check for available data manually.  */
>> +          if (nfds >= 0)
>>          {
>> -          fd_set tls_available;
>> -          int set = 0;
>> -
>> -          FD_ZERO (&tls_available);
>> -          if (! wait_proc)
>> -            {
>> -              /* We're not waiting on a specific process, so loop
>> -                 through all the channels and check for data.
>> -                 This is a workaround needed for some versions of
>> -                 the gnutls library -- 2.12.14 has been confirmed
>> -                 to need it.  See
>> -                 http://comments.gmane.org/gmane.emacs.devel/145074 */
>> -              for (channel = 0; channel < FD_SETSIZE; ++channel)
>> -                if (! NILP (chan_process[channel]))
>> -                  {
>> -                    struct Lisp_Process *p =
>> -                      XPROCESS (chan_process[channel]);
>> -                    if (p && p->gnutls_p && p->gnutls_state
>> -                        && ((emacs_gnutls_record_check_pending
>> -                             (p->gnutls_state))
>> -                            > 0))
>> -                      {
>> -                        nfds++;
>> -                        eassert (p->infd == channel);
>> -                        FD_SET (p->infd, &tls_available);
>> -                        set++;
>> -                      }
>> -                  }
>> -            }
>> -          else
>> -            {
>> -              /* Check this specific channel.  */
>> -              if (wait_proc->gnutls_p /* Check for valid process.  */
>> -                  && wait_proc->gnutls_state
>> -                  /* Do we have pending data?  */
>> -                  && ((emacs_gnutls_record_check_pending
>> -                       (wait_proc->gnutls_state))
>> -                      > 0))
>> -                {
>> -                  nfds = 1;
>> -                  eassert (0 <= wait_proc->infd);
>> -                  /* Set to Available.  */
>> -                  FD_SET (wait_proc->infd, &tls_available);
>> -                  set++;
>> -                }
>> -            }
>> -          if (set)
>> -            Available = tls_available;
>> +              for (channel = 0; channel < FD_SETSIZE; ++channel)
>> +                if (! NILP (chan_process[channel]))
>> +                  {
>> +                    struct Lisp_Process *p =
>> +                      XPROCESS (chan_process[channel]);
>> +
>> +                    if (just_wait_proc && p != wait_proc)
>> +                      continue;
>> +
>> +                    if (p && p->gnutls_p && p->gnutls_state
>> +                        && ((emacs_gnutls_record_check_pending
>> +                             (p->gnutls_state))
>> +                            > 0))
>> +                      {
>> +                        nfds++;
>> +                        eassert (p->infd == channel);
>> +                        FD_SET (p->infd, &Available);
>> +                      }
>> +                  }
> 
> This change is hard to read.  The original code already called
> emacs_gnutls_record_check_pending, and there's no calls to pselect in
> the hunk, so I'm unsure what exactly are we changing here, in terms of
> the details.  The overview in the commit log just gives the general
> idea, but its hard for me to connect that to the actual code changes.
> Plus, you lose some of the comments, for some reason, even though the
> same code is still present.
> 
> Bottom line, I'd appreciate more details for this part.

With Emacs 25.2, the required GnuTLS version was set to 2.12.2 (from
the "2.6.6 or later" previous default).

GnuTLS 2.12 changed its default from lowat mode to non-lowat mode (and
removed lowat mode with GnuTLS 3 completely). What this means is, there
is no data left intentionally in the kernel buffers for a select to work
properly. So we usually do not get any of the fds that belong to GnuTLS
set as available when we do a select() call.

The old Emacs code only checked _all_ channels if there was no wait_proc
and if _no_ fds (nfds == 0) were set as available by our previous
select() call. That was wrong and missed a few important cases and could
lead to unnecessary waits or stalls.

Previously, the old code treated a wait_proc as if just_wait_proc was
set as well and only checked wait_proc and ignored the others. That was
a bug as well.

The new code always (if nfds >= 0) checks all (if !just_wait_proc)
channels and sets the available fds in addition to the ones reported by
select().

I hope that clears it up. The new code is basically a lot simpler and it
tries to do the right thing (tm). :-)

>> diff --git a/src/xgselect.c b/src/xgselect.c
>> index fedd3127ef..f68982143e 100644
>> --- a/src/xgselect.c
>> +++ b/src/xgselect.c
>> @@ -143,6 +143,14 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, 
>> fd_set *efds,
>>              ++retval;
>>          }
>>      }
>> +  else if (nfds == 0)
>> +    {
>> +      // pselect() clears the file descriptor sets if no fd is ready (but
>> +      // not if an error occurred), so should we to be compatible. 
>> (Bug#21337)
>> +      if (rfds) FD_ZERO (rfds);
>> +      if (wfds) FD_ZERO (wfds);
>> +      if (efds) FD_ZERO (efds);
>> +    }
> 
> This is OK, but please don't use C++ style comments, it's not our
> style.  Also, please be sure to test this when waiting on pselect is
> interrupted by C-g, we had some problems in that area which had roots
> in xg_select.

I will change that with the next version of the patches after I get your
comments back. Regarding the problems you mentioned: Can you point me to
a bug report and more information about that? If anything breaks with
the new behavior, then that broken code is clearly doing something wrong
and needs fixing. Personally, I have been running Emacs with the patches
since I posted them and I have not run into a single issue.

Thanks for taking the time, Eli. It is very much appreciated!

So long,
Matthias

-- 
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
[Prev in Thread]
Current Thread
[Next in Thread]
Re: wait_reading_process_ouput hangs in certain cases (w/ patches), Matthias Dahl <=
- Re: wait_reading_process_ouput hangs in certain cases (w/ patches), Paul Eggert, 2018/08/07
Prev by Date: Re: bignum branch
Next by Date: package.el fix (was Re: [PATCH] do not auto-remove external packages (fixes #27822))
Previous by thread: Backport "Fix bugs in `auth-source-netrc-parse-one'" to emacs-26
Next by thread: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Index(es):
- Date
- Thread