guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Making custom binary input ports unbuffered


From: Ludovic Courtès
Subject: Making custom binary input ports unbuffered
Date: Wed, 15 Jan 2014 00:00:04 +0100
User-agent: Gnus/5.130007 (Ma Gnus v0.7) Emacs/24.3 (gnu/linux)

Hi!

As discussed on IRC, our R6 custom binary input ports (CBIPs) are
currently buffered.  The buffer is hard-coded and setvbuf doesn’t work
on non-file ports.

Having a buffer can be problematic for several reasons.

  1. The user’s ‘get-position’ will always point past what the port’s
     user sees.

     This could be worked around by subtracting read_pos to what
     ‘get-position’ returns, but that feels wrong: conceptually, the
     port’s position is something under the CBIP implementor’s control.
     (I wonder how fopencookie/fseek deal with this.)

  2. Some applications want no buffering.

     My use case was that I read from a byte stream, and at some point I
     want to compute a hash over a delimited part of that stream.  To do
     that, I intuitively wanted to have a CBIP that wraps some other
     input port, and do the hash computation in that CBIP.  But that
     only works if we can guarantee that the CBIP doesn’t read more than
     what was actually asked.

     Same for the delimited port in (web response).

The patch below makes CBIPs unbuffered (see the tests for the properties
it gives.)  It works thanks to the optimization in ‘scm_c_read’ for
unbuffered binary ports.

This is going to be a performance hit for applications that read things
byte by byte, *or* via textual procedures (‘scm_getc’, ‘get-string’,
etc.)  But the assumption is that people rather use ‘get-bytevector-n’
(or similar) to get a chunk of data.

However!  There are places in (web ...) where CBIPs are used for mixed
binary/textual input.  When that happens, all the accesses end up being
unbuffered, which really sucks.


So, what do we do?  :-)

Thanks,
Ludo’.

>From ef60588d6e76d6ad0ae09197043c6d7371beb1b7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <address@hidden>
Date: Tue, 14 Jan 2014 21:54:51 +0100
Subject: [PATCH] Make custom binary input ports unbuffered.

* libguile/r6rs-ports.c (make_cbip): Leave read_{pos,end,buf_size}
  unchanged and call 'scm_port_non_buffer'.
  (cbip_fill_input): Change to use PORT's associated bytevector as an
  intermediate copy passed to READ_PROC and then copied back into PORT's
  own buffer.  Reallocate a new bytevector when it's smaller than the
  current 'read_buf_size'.  Don't loop back to 'again' label and remove
  it.
* test-suite/tests/r6rs-ports.test ("7.2.7 Input Ports")["custom binary
  input & 'port-position'", "custom binary input & 'read!' calls"]: New
  tests.
---
 libguile/r6rs-ports.c            | 77 ++++++++++++++++++++++++++--------------
 test-suite/tests/r6rs-ports.test | 60 +++++++++++++++++++++++++++++++
 2 files changed, 110 insertions(+), 27 deletions(-)

diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index 790c24c..f2a654e 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -280,13 +280,17 @@ cbp_close (SCM port)
 
 static scm_t_bits custom_binary_input_port_type = 0;
 
-/* Size of the buffer embedded in custom binary input ports.  */
+/* Initial size of the buffer embedded in custom binary input ports.  */
 #define CBIP_BUFFER_SIZE  4096
 
 /* Return the bytevector associated with PORT.  */
 #define SCM_CBIP_BYTEVECTOR(_port)                             \
   SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 4)
 
+/* Set BV as the bytevector associated with PORT.  */
+#define SCM_SET_CBIP_BYTEVECTOR(_port, _bv)                            \
+  SCM_SIMPLE_VECTOR_SET (SCM_PACK (SCM_STREAM (_port)), 4, (_bv))
+
 /* Return the various procedures of PORT.  */
 #define SCM_CBIP_READ_PROC(_port)                              \
   SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0)
@@ -297,15 +301,11 @@ make_cbip (SCM read_proc, SCM get_position_proc,
           SCM set_position_proc, SCM close_proc)
 {
   SCM port, bv, method_vector;
-  char *c_bv;
-  unsigned c_len;
   scm_t_port *c_port;
   const unsigned long mode_bits = SCM_OPN | SCM_RDNG;
 
-  /* Use a bytevector as the underlying buffer.  */
-  c_len = CBIP_BUFFER_SIZE;
-  bv = scm_c_make_bytevector (c_len);
-  c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
+  /* Pre-allocate a bytevector to be passed to the 'read!' method.  */
+  bv = scm_c_make_bytevector (CBIP_BUFFER_SIZE);
 
   /* Store the various methods and bytevector in a vector.  */
   method_vector = scm_c_make_vector (5, SCM_BOOL_F);
@@ -326,10 +326,13 @@ make_cbip (SCM read_proc, SCM get_position_proc,
   /* Attach it the method vector.  */
   SCM_SETSTREAM (port, SCM_UNPACK (method_vector));
 
-  /* Have the port directly access the buffer (bytevector).  */
-  c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv;
-  c_port->read_end = (unsigned char *) c_bv;
-  c_port->read_buf_size = c_len;
+  /* Make it unbuffered.  This is necessary to guarantee that (1) users can
+     actually implement GET_POSITION_PROC correctly, and that (2) each
+     'get-bytevector-*' call has exactly one corresponding READ_PROC call.
+     The latter is necessary in some applications, typically when wrapping
+     another port where we don't want to consume more than what was
+     actually asked for.  */
+  scm_port_non_buffer (c_port);
 
   /* Mark PORT as open, readable and unbuffered (hmm, how elegant...).  */
   SCM_SET_CELL_TYPE (port, custom_binary_input_port_type | mode_bits);
@@ -346,34 +349,54 @@ cbip_fill_input (SCM port)
   int result;
   scm_t_port *c_port = SCM_PTAB_ENTRY (port);
 
- again:
   if (c_port->read_pos >= c_port->read_end)
     {
       /* Invoke the user's `read!' procedure.  */
-      unsigned c_octets;
+      size_t c_octets, c_requested;
       SCM bv, read_proc, octets;
 
-      /* Use the bytevector associated with PORT as the buffer passed to the
+      read_proc = SCM_CBIP_READ_PROC (port);
+
+      /* Attempt to pass the bytevector associated with PORT to the
         `read!' procedure, thereby avoiding additional allocations.  */
       bv = SCM_CBIP_BYTEVECTOR (port);
-      read_proc = SCM_CBIP_READ_PROC (port);
 
-      /* The assumption here is that C_PORT's internal buffer wasn't changed
-        behind our back.  */
-      assert (c_port->read_buf ==
-             (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv));
-      assert ((unsigned) c_port->read_buf_size
-             == SCM_BYTEVECTOR_LENGTH (bv));
+      /* When called via the 'get-bytevector-*' procedures, and thus via
+        'scm_c_read', we are passed the caller-provided buffer, so we need
+        to check its size.  */
+      c_requested = c_port->read_buf_size;
+
+      if (SCM_BYTEVECTOR_LENGTH (bv) < c_requested)
+       {
+         /* Bad luck: we have to make another allocation.  Save that
+            bytevector for later reuse, in the hope that the application
+            has regular access patterns.  */
+         bv = scm_c_make_bytevector (c_requested);
+         SCM_SET_CBIP_BYTEVECTOR (port, bv);
+       }
+
+      /* READ_PROC must always be called with a strictly positive number of
+        bytes to read; otherwise it is forced to return 0, which is used
+        to indicate EOF.  */
+      if (SCM_LIKELY (c_requested > 0))
+       {
+         octets = scm_call_3 (read_proc, bv, SCM_INUM0,
+                              scm_from_size_t (c_requested));
+         c_octets = scm_to_size_t (octets);
+       }
+      else
+       c_octets = 0;
 
-      octets = scm_call_3 (read_proc, bv, SCM_INUM0,
-                          SCM_I_MAKINUM (CBIP_BUFFER_SIZE));
-      c_octets = scm_to_uint (octets);
+      if (SCM_UNLIKELY (c_octets > c_requested))
+       scm_out_of_range (FUNC_NAME, octets);
 
-      c_port->read_pos = (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv);
+      /* Copy the data back to the original buffer.  */
+      memcpy ((char *) c_port->read_pos, SCM_BYTEVECTOR_CONTENTS (bv),
+             c_octets);
       c_port->read_end = (unsigned char *) c_port->read_pos + c_octets;
 
-      if (c_octets > 0)
-       goto again;
+      if (c_octets != 0 || c_requested == 0)
+       result = (int) *c_port->read_pos;
       else
        result = EOF;
     }
diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test
index eaae29f..41b46b1 100644
--- a/test-suite/tests/r6rs-ports.test
+++ b/test-suite/tests/r6rs-ports.test
@@ -447,6 +447,66 @@ not `set-port-position!'"
                          (u8-list->bytevector
                           (map char->integer (string->list "Port!")))))))
 
+  (pass-if-equal "custom binary input & 'port-position'"
+      '(0 2 5 11)
+    ;; Check that the value returned by 'port-position' is correct, and
+    ;; that each 'port-position' call leads one call to the
+    ;; 'get-position' method.
+    (let* ((str    "Hello Port!")
+           (output (make-bytevector (string-length str)))
+           (source (with-fluids ((%default-port-encoding "UTF-8"))
+                     (open-string-input-port str)))
+           (read!  (lambda (bv start count)
+                     (let ((r (get-bytevector-n! source bv start count)))
+                       (if (eof-object? r)
+                           0
+                           r))))
+           (pos     '())
+           (get-pos (lambda ()
+                      (let ((p (port-position source)))
+                        (set! pos (cons p pos))
+                        p)))
+           (port    (make-custom-binary-input-port "the port" read!
+                                                   get-pos #f #f)))
+
+      (and (= 0 (port-position port))
+           (begin
+             (get-bytevector-n! port output 0 2)
+             (= 2 (port-position port)))
+           (begin
+             (get-bytevector-n! port output 2 3)
+             (= 5 (port-position port)))
+           (let ((bv (string->utf8 (get-string-all port))))
+             (bytevector-copy! bv 0 output 5 (bytevector-length bv))
+             (= (string-length str) (port-position port)))
+           (bytevector=? output (string->utf8 str))
+           (reverse pos))))
+
+  (pass-if-equal "custom binary input & 'read!' calls"
+      `((2 "He") (3 "llo") (42 " Port!"))
+    (let* ((str    "Hello Port!")
+           (source (with-fluids ((%default-port-encoding "UTF-8"))
+                     (open-string-input-port str)))
+           (reads  '())
+           (read!  (lambda (bv start count)
+                     (set! reads (cons count reads))
+                     (let ((r (get-bytevector-n! source bv start count)))
+                       (if (eof-object? r)
+                           0
+                           r))))
+           (port   (make-custom-binary-input-port "the port" read!
+                                                  #f #f #f)))
+
+      (let ((ret (list (get-bytevector-n port 2)
+                       (get-bytevector-n port 3)
+                       (get-bytevector-n port 42))))
+        (zip (reverse reads)
+             (map (lambda (obj)
+                    (if (bytevector? obj)
+                        (utf8->string obj)
+                        obj))
+                  ret)))))
+
   (pass-if "custom binary input port `close-proc' is called"
     (let* ((closed?  #f)
            (read!    (lambda (bv start count) 0))
-- 
1.8.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]