chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Re: How are exceptions propagated? - details on the race


From: F. Wittenberger
Subject: [Chicken-users] Re: How are exceptions propagated? - details on the race
Date: Wed, 20 Aug 2008 23:59:25 +0200

Am Mittwoch, den 20.08.2008, 08:29 +0200 schrieb felix winkelmann:
> On Tue, Aug 12, 2008 at 4:39 PM, Jörg F. Wittenberger
> <address@hidden> wrote:
> > Am Donnerstag, den 07.08.2008, 23:05 +0200 schrieb Jörg F. Wittenberger:
> >> Hi all,
> >>
> >> this is once again a slightly complicated test case.  Again I understand
> >> all calls for a simpler version.  Just I have a hard time to find one.
> >
> > I've been able to track this one down to chicken not handling bad
> > filedescriptors in ##sys#unblock-threads-for-i/o .

Since Elf expressed some doubt upon the existence of the race - which I
can understand, since race conditions are usually hard to reproduce
reliably, thus there's a good chance that my test case did not exhibit
the problem on his machine - I guess it might be good for the review, if
I comment comment on some details.

It's actually not that hard to understand the problem - that is, if we
start from the presumption that the runtime system ought to be robust to
some misuse.  After all, we have file-close at our disposal and even
without it would be all too easy to get a bad fd, at least when using
libraries.

----

So once there's a thread waiting on a fd, which became bad in the
meantime, what's going on in the scheduler?

(define (##sys#unblock-threads-for-i/o)
  (dbg "fd-list: " ##sys#fd-list)
  (let* ([to? (pair? ##sys#timeout-list)]
         [rq? (pair? ##sys#ready-queue-head)]
         [n (##sys#fdset-select-timeout ; we use FD_SETSIZE, but really should 
use max fd
             (or rq? to?)
             (if (and to? (not rq?))    ; no thread was unblocked by timeout, 
so wait
                 (let* ([tmo1 (caar ##sys#timeout-list)]
                        [now (##sys#fudge 16)])
                   (fxmax 0 (- tmo1 now)) )
                 0) ) ] )               ; otherwise immediate timeout.
    (dbg n " fds ready")

If there's a bad fd, we shall see "-1 fds ready", the return code from
select(2).

    (cond [(eq? -1 n)
           (cond
            (error-bad-file
             (set! ##sys#fd-list
                   (let loop ((l ##sys#fd-list))
                     (cond
                      ((null? l) l)
                      ((##sys#handle-bad-fd! (car l))
                       (##sys#fdset-clear (caar l))
                       ;; This is supposed to be a rare case, catch
                       ;; them one by one, not all at once
                       ;; (commented out here).
                       ;; (loop (cdr l))
                       (cdr l))
                      (else (cons (car l) (loop (cdr l)))))))
             (##sys#fdset-restore)
             (##sys#unblock-threads-for-i/o))

If this above case is not there, we switch to the primordial thread.

            (else (##sys#force-primordial))) ]

Now let's delay the question, whether the "else" case is handled
gracefully with the change.

(define (##sys#force-primordial)
  (dbg "primordial thread forced due to interrupt")
  ;(display "switching to primordial thread\n" debug-port)
  (##sys#thread-unblock! ##sys#primordial-thread) )

That's actually all it takes.

----

It all depends on the state of the primordial, there is no special
provision in ##sys#force-primordial.  In my case it was waiting in a
thread-join!:

(define thread-join!
  (lambda (thread . timeout)
    (##sys#check-structure thread 'thread 'thread-join!)
    (let* ((limit (and (pair? timeout) (##sys#compute-time-limit
(##sys#slot timeout 0))))
           (rest (and (pair? timeout) (##sys#slot timeout 1)))
           (tosupplied (and rest (pair? rest)))
           (toval (and tosupplied (##sys#slot rest 0))) )
      (##sys#call-with-current-continuation
       (lambda (return)
         (let ([ct ##sys#current-thread])
           (when limit (##sys#thread-block-for-timeout! ct limit))
           (##sys#setslot
            ct 1
            (lambda ()

So it's going to continue here:

              (case (##sys#slot thread 3)
                [(dead) (apply return (##sys#slot thread 2))]
                [(terminated)
                 (return 
                  (##sys#signal
                   (##sys#make-structure 
                    'condition '(uncaught-exception)
                    (list '(uncaught-exception . reason) (##sys#slot thread
7)) ) ) ) ]

and since the thread is neither dead not terminated...

                [else
                 (return
                  (if tosupplied
                      toval
                      (##sys#signal
                       (##sys#make-structure 'condition 
'(join-timeout-exception)
'())) ) ) ] ) ) )

the above case applies.  In fact I was lucky: if it had been waiting on
a mutex for a precious resource, it would have entered the critical
section.  Wherever it was, the primordial is just unblocked.

----

Now let's come back to the question, whether the "else" case is handled
correct.  Probably not.  I have only a Linux here right now, but man 2
select gives:

ERRORS
       EBADF  An invalid file descriptor was given in one of the sets.
(Perhaps a file descriptor that  was
              already closed, or one on which an error has occurred.)

       EINTR  A signal was caught.

       EINVAL nfds is negative or the value contained within timeout is
invalid.

       ENOMEM unable to allocate memory for internal tables.

I believe none of them should simply activate the primordial.
EBADF is handled now.

For EINTR I have yet to understand how the signals are propagated, but
I'm afraid we need some code here too.

EINVAL would be a grave programming error in the scheduler.  Maybe it's
better to give a message and die here.  Similar for ENOMEM, though this
is not chickens fault.

----

The same consideration should be applied to ##sys#schedule, where the
variable "eintr" controls ##sys#force-primordial .   At the other hand,
signals are handled somehow, so probably I have overlooked something.
(Felix?)

----

Now to the really interesting question: what should be done, once a
defunct fd is found?  Since ##sys#fd-list contains fd's and threads
only, the simple solution is (here a better version than in my last
message):

(define (##sys#handle-bad-fd! e)
  (dbg "check bad" e)
  (let ((bad ((foreign-lambda*
               bool ((integer fd))
               "struct stat buf;"
               "int i = ( (fstat(fd, &buf) == -1 && errno == EBADF) ? 1 : 0);"
               "return(i);")
              (car e))))
    (if bad
        (for-each
         (lambda (thread)
           (thread-signal!
            thread
            (##sys#make-structure
             'condition
             '(exn i/o) ;; better? '(exn i/o net)
             (list '(exn . message) "bad file descriptor"
                   '(exn . arguments) (car e)
                   '(exn . location) thread) )))
         (cdr e)))
    bad))

thread-signal! them a condition.  In fact it might be better if we could
close the appropriate ports behind.  But that's easily getting messy:
the fd-list would now have to hold both, the fd and the port.  Lot's of
changes ahead.  I'd abstain.

> > The attached patch uses fstat(2) to check the fd-list.
> >
> > Unfortunately I have no idea how well this is going to be supported
> > under windows.
> 
> Not very well, but perhaps it can at least be supported under
> UNIXish environments.

Is there no good way on windows to tell a bad fd from a good one?
Anything will do.  If worst comes to worst, we could repeat the
select(2) with just the fd in question.

best regards

/Jörg




reply via email to

[Prev in Thread] Current Thread [Next in Thread]