guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCHES] Keyword args for file openers; coding scan off by default


From: Mark H Weaver
Subject: [PATCHES] Keyword args for file openers; coding scan off by default
Date: Sun, 07 Apr 2013 02:52:10 -0400

Hello all,

Here's the last major patch set that I very much hope to get into 2.0.8.
The first patch disables the coding declaration scan in 'open-file' by
default.  I feel quite strongly that this is important for robustness
and security reasons, and I'm pleased to report that Ludovic recently
gave his consent to this for 2.0.8 on IRC.

The last patch adds keyword arguments to the commonly-used file opening
procedures to conveniently set the encoding, binary mode, and also a way
to enable the coding declaration scan if desired.

Comments and suggestions solicited.

      Mark


>From bb621dbbc0df691bcad541267a08c86c36d9062b Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Wed, 30 Jan 2013 14:45:28 -0500
Subject: [PATCH 1/3] Do not scan for coding declarations in open-file.

* libguile/fports.c (scm_open_file): Do not scan for coding
  declarations.  Replace 'use_encoding' local variable with
  'binary'.  Update documentation string.

* module/ice-9/psyntax.scm (include): Add the same file-encoding
  logic that's used in compile-file and scm_primitive_load.

* module/ice-9/psyntax-pp.scm: Regenerate.

* doc/ref/api-io.texi (File Ports): Update docs.

* test-suite/tests/ports.test: Change "open-file HONORS file coding
  declarations" test to "open-file IGNORES file coding declaration".

* test-suite/tests/coding.test (scan-coding): Use 'file-encoding' to
  scan for the encoding, since 'open-input-file' no longer does so.
---
 doc/ref/api-io.texi          |   11 ++---------
 libguile/fports.c            |   28 +++++-----------------------
 module/ice-9/psyntax-pp.scm  |   10 ++++++----
 module/ice-9/psyntax.scm     |   13 +++++++++----
 test-suite/tests/coding.test |    4 ++--
 test-suite/tests/ports.test  |    5 ++---
 6 files changed, 26 insertions(+), 45 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 9c3e1fc..e7e1bb2 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -885,8 +885,8 @@ Use binary mode, ensuring that each byte in the file will 
be read as one
 Scheme character.
 
 To provide this property, the file will be opened with the 8-bit
-character encoding "ISO-8859-1", ignoring any coding declaration or port
-encoding.  @xref{Ports}, for more information on port encodings.
+character encoding "ISO-8859-1", ignoring the default port encoding.
address@hidden, for more information on port encodings.
 
 Note that while it is possible to read and write binary data as
 characters or strings, it is usually better to treat bytes as octets,
@@ -903,13 +903,6 @@ because of its port encoding ramifications.
 If a file cannot be opened with the access
 requested, @code{open-file} throws an exception.
 
-When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If a coding
-declaration is found, it will be used to interpret the file.  Otherwise,
-the port's encoding will be used.  To suppress this behavior, open the
-file in binary mode and then set the port encoding explicitly using
address@hidden
-
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
 current interfaces.
diff --git a/libguile/fports.c b/libguile/fports.c
index f6c3c92..727fe27 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -349,8 +349,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
            "@item b\n"
            "Open the underlying file in binary mode, if supported by the 
system.\n"
            "Also, open the file using the binary-compatible character 
encoding\n"
-           "\"ISO-8859-1\", ignoring the port's encoding and the coding 
declaration\n"
-           "at the top of the input file, if any.\n"
+           "\"ISO-8859-1\", ignoring the default port encoding.\n"
            "@item +\n"
            "Open the port for both input and output.  E.g., @code{r+}: open\n"
            "an existing file for both input and output.\n"
@@ -365,11 +364,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
            "Add line-buffering to the port.  The port output buffer will be\n"
            "automatically flushed whenever a newline character is written.\n"
            "@end table\n"
-           "When the file is opened, this procedure will scan for a coding\n"
-           "address@hidden Encoding of Source Files}. If present\n"
-           "will use that encoding for interpreting the file.  Otherwise, 
the\n"
-           "port's encoding will be used.\n"
-           "\n"
            "In theory we could create read/write ports which were buffered\n"
            "in one direction only.  However this isn't included in the\n"
            "current interfaces.  If a file cannot be opened with the access\n"
@@ -377,7 +371,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 #define FUNC_NAME s_scm_open_file
 {
   SCM port;
-  int fdes, flags = 0, use_encoding = 1;
+  int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
@@ -412,7 +406,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
          flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
          break;
        case 'b':
-         use_encoding = 0;
+         binary = 1;
 #if defined (O_BINARY)
          flags |= O_BINARY;
 #endif
@@ -451,20 +445,8 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
   port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
                              fport_canonicalize_filename (filename));
 
-  if (use_encoding)
-    {
-      /* If this file has a coding declaration, use that as the port
-        encoding.  */
-      if (SCM_INPUT_PORT_P (port))
-       {
-         char *enc = scm_i_scan_for_encoding (port);
-         if (enc != NULL)
-           scm_i_set_port_encoding_x (port, enc);
-       }
-    }
-  else
-    /* If this is a binary file, use the binary-friendly ISO-8859-1
-       encoding.  */
+  if (binary)
+    /* Use the binary-friendly ISO-8859-1 encoding. */
     scm_i_set_port_encoding_x (port, NULL);
 
   scm_dynwind_end ();
diff --git a/module/ice-9/psyntax-pp.scm b/module/ice-9/psyntax-pp.scm
index 8619d78..254f701 100644
--- a/module/ice-9/psyntax-pp.scm
+++ b/module/ice-9/psyntax-pp.scm
@@ -2975,10 +2975,12 @@
            (lambda (fn dir k)
              (let ((p (open-input-file
                         (if (absolute-file-name? fn) fn (in-vicinity dir 
fn)))))
-               (let f ((x (read p)) (result '()))
-                 (if (eof-object? x)
-                   (begin (close-input-port p) (reverse result))
-                   (f (read p) (cons (datum->syntax k x) result))))))))
+               (let ((enc (file-encoding p)))
+                 (set-port-encoding! p (let ((t enc)) (if t t "UTF-8")))
+                 (let f ((x (read p)) (result '()))
+                   (if (eof-object? x)
+                     (begin (close-input-port p) (reverse result))
+                     (f (read p) (cons (datum->syntax k x) result)))))))))
         (let ((src (syntax-source x)))
           (let ((file (if src (assq-ref src 'filename) #f)))
             (let ((dir (if (string? file) (dirname file) #f)))
diff --git a/module/ice-9/psyntax.scm b/module/ice-9/psyntax.scm
index b359fc1..d63861c 100644
--- a/module/ice-9/psyntax.scm
+++ b/module/ice-9/psyntax.scm
@@ -2945,10 +2945,15 @@
   (lambda (x)
     (define read-file
       (lambda (fn dir k)
-        (let ((p (open-input-file
-                  (if (absolute-file-name? fn)
-                      fn
-                      (in-vicinity dir fn)))))
+        (let* ((p (open-input-file
+                   (if (absolute-file-name? fn)
+                       fn
+                       (in-vicinity dir fn))))
+               (enc (file-encoding p)))
+
+          ;; Choose the input encoding deterministically.
+          (set-port-encoding! p (or enc "UTF-8"))
+
           (let f ((x (read p))
                   (result '()))
             (if (eof-object? x)
diff --git a/test-suite/tests/coding.test b/test-suite/tests/coding.test
index 4152af8..0a15d93 100644
--- a/test-suite/tests/coding.test
+++ b/test-suite/tests/coding.test
@@ -1,6 +1,6 @@
 ;;;; coding.test --- test suite for coding declarations. -*- mode: scheme -*-
 ;;;;
-;;;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2011, 2013 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -40,7 +40,7 @@
      ;; relies on the opportunistic filling of the input buffer, which
      ;; doesn't happen after a seek.
      (let* ((port (open-input-file name))
-            (res (port-encoding port)))
+            (res (file-encoding port)))
        (close-port port)
        res))))
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 0dbd3b2..5d3c213 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -270,13 +270,12 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file honors file coding declarations
-(pass-if "file: open-file honors coding declarations"
+;; open-file ignores file coding declaration
+(pass-if "file: open-file ignores coding declarations"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
                       (test-string "€100"))
-                 (set-port-encoding! port "ISO-8859-15")
                  (write-line ";; coding: iso-8859-15" port)
                  (write-line test-string port)
                  (close-port port)
-- 
1.7.10.4

>From 2b4fa986ac43b5a8705f2d24e5fc908e07a89b94 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Fri, 5 Apr 2013 17:33:03 -0400
Subject: [PATCH 2/3] Remove byte-order mark check from
 'scm_i_scan_for_encoding'.

* libguile/read.c (scm_i_scan_for_encoding): Remove byte-order mark
  check.
---
 libguile/read.c |   10 ----------
 1 file changed, 10 deletions(-)

diff --git a/libguile/read.c b/libguile/read.c
index 64b55c3..e2e2e4a 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1986,7 +1986,6 @@ scm_i_scan_for_encoding (SCM port)
   char header[SCM_ENCODING_SEARCH_SIZE+1];
   size_t bytes_read, encoding_length, i;
   char *encoding = NULL;
-  int utf8_bom = 0;
   char *pos, *encoding_start;
   int in_comment;
 
@@ -2031,10 +2030,6 @@ scm_i_scan_for_encoding (SCM port)
       scm_seek (port, scm_from_int (0), scm_from_int (SEEK_SET));
     }
 
-  if (bytes_read > 3 
-      && header[0] == '\xef' && header[1] == '\xbb' && header[2] == '\xbf')
-    utf8_bom = 1;
-
   /* search past "coding[:=]" */
   pos = header;
   while (1)
@@ -2103,11 +2098,6 @@ scm_i_scan_for_encoding (SCM port)
     /* This wasn't in a comment */
     return NULL;
 
-  if (utf8_bom && c_strcasecmp(encoding, "UTF-8"))
-    scm_misc_error (NULL,
-                   "the port input declares the encoding ~s but is encoded as 
UTF-8",
-                   scm_list_1 (scm_from_locale_string (encoding)));
-
   return encoding;
 }
 
-- 
1.7.10.4

>From 951b9d224d84bfec271b51615bc095013d153694 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Sat, 6 Apr 2013 23:19:55 -0400
Subject: [PATCH 3/3] Add keyword arguments to file opening procedures.

* libguile/fports.c (scm_open_file_with_encoding): New API function,
  containing the code previously found in 'scm_open_file', but modified
  to accept the new 'guess_encoding' and 'encoding' arguments.

  (scm_open_file): Now just a simple wrapper that calls
  'scm_open_file_with_encoding'.

  (scm_i_open_file): New implementation of 'open-file' that accepts
  keyword arguments '#:guess-encoding' and '#:encoding', and calls
  'scm_open_file_with_encoding'.

  (scm_init_fports_keywords): New initialization function that gets
  called after keywords are initialized.

* libguile/fports.h (scm_open_file_with_encoding,
  scm_init_fports_keywords): Add prototypes.

* libguile/init.c (scm_i_init_guile): Call 'scm_init_fports_keywords'.

* module/ice-9/boot-9.scm: Add enhanced versions of 'open-input-file',
  'open-output-file', 'call-with-input-file', 'call-with-output-file',
  'with-input-from-file', 'with-output-to-file', and
  'with-error-to-file', that accept keyword arguments '#:binary',
  '#:encoding', and (for input port constructors) '#:guess-encoding'.

* doc/ref/api-io.texi (File Ports): Update documentation.

* test-suite/tests/ports.test ("keyword arguments for file openers"):
  Add tests.
---
 doc/ref/api-io.texi         |   60 ++++++---
 libguile/fports.c           |  180 +++++++++++++++++++-------
 libguile/fports.h           |    3 +
 libguile/init.c             |    1 +
 module/ice-9/boot-9.scm     |  110 ++++++++++++++++
 test-suite/tests/ports.test |  291 ++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 579 insertions(+), 66 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index e7e1bb2..da57328 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -843,7 +843,10 @@ Most systems have limits on how many files can be open, so 
it's
 strongly recommended that file ports be closed explicitly when no
 longer required (@pxref{Ports}).
 
address@hidden {Scheme Procedure} open-file filename mode
address@hidden {Scheme Procedure} open-file filename mode @
+                          [#:guess-encoding=#f] [#:encoding=#f]
address@hidden {C Function} scm_open_file_with_encoding @
+                     (filename, mode, guess_encoding, encoding)
 @deffnx {C Function} scm_open_file (filename, mode)
 Open the file whose name is @var{filename}, and return a port
 representing that file.  The attributes of the port are
@@ -900,8 +903,17 @@ to the underlying @code{open} call.  Still, the flag is 
generally useful
 because of its port encoding ramifications.
 @end table
 
-If a file cannot be opened with the access
-requested, @code{open-file} throws an exception.
+Unless binary mode is requested, the character encoding of the new port
+is determined as follows: First, if @var{guess-encoding} is true,
+heuristics will be used to guess the encoding of the file.  If it is
+false or if the heuristics are unsuccessful, @var{encoding} is used
+unless it is also false.  As a last resort, the default port encoding is
+used.  @xref{Ports}, for more information on port encodings.  It is an
+error to pass a non-false @var{guess-encoding} or @var{encoding} if
+binary mode is requested.
+
+If a file cannot be opened with the access requested, @code{open-file}
+throws an exception.
 
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
@@ -909,23 +921,40 @@ current interfaces.
 @end deffn
 
 @rnindex open-input-file
address@hidden {Scheme Procedure} open-input-file filename
-Open @var{filename} for input.  Equivalent to
address@hidden {Scheme Procedure} open-input-file filename @
+       [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for input.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} and
address@hidden determine the character encoding as described above
+for @code{open-file}.  Equivalent to
 @lisp
-(open-file @var{filename} "r")
+(open-file @var{filename}
+           (if @var{binary} "rb" "r")
+           #:guess-encoding @var{guess-encoding}
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
 @rnindex open-output-file
address@hidden {Scheme Procedure} open-output-file filename
-Open @var{filename} for output.  Equivalent to
address@hidden {Scheme Procedure} open-output-file filename @
+       [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for output.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} specifies the
+character encoding as described above for @code{open-file}.  Equivalent
+to
 @lisp
-(open-file @var{filename} "w")
+(open-file @var{filename}
+           (if @var{binary} "wb" "w")
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
address@hidden {Scheme Procedure} call-with-input-file filename proc
address@hidden {Scheme Procedure} call-with-output-file filename proc
address@hidden {Scheme Procedure} call-with-input-file filename proc @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} call-with-output-file filename proc @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex call-with-input-file
 @rnindex call-with-output-file
 Open @var{filename} for input or output, and call @code{(@var{proc}
@@ -940,9 +969,12 @@ closed automatically, though it will be garbage collected 
in the usual
 way if not otherwise referenced.
 @end deffn
 
address@hidden {Scheme Procedure} with-input-from-file filename thunk
address@hidden {Scheme Procedure} with-output-to-file filename thunk
address@hidden {Scheme Procedure} with-error-to-file filename thunk
address@hidden {Scheme Procedure} with-input-from-file filename thunk @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} with-output-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} with-error-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex with-input-from-file
 @rnindex with-output-to-file
 Open @var{filename} and call @code{(@var{thunk})} with the new port
diff --git a/libguile/fports.c b/libguile/fports.c
index 727fe27..442b628 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -316,65 +316,36 @@ fport_canonicalize_filename (SCM filename)
     }
 }
 
-
-/* scm_open_file
+/* scm_open_file_with_encoding
  * Return a new port open on a given file.
  *
+ * Use heuristics to guess the encoding is GUESS_ENCODING
+ * is true, else use ENCODING if not false, else use the
+ * default port encoding.
+ *
  * The mode string must match the pattern: [rwa+]** which
  * is interpreted in the usual unix way.
  *
  * Return the new port.
  */
-SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
-           (SCM filename, SCM mode),
-           "Open the file whose name is @var{filename}, and return a port\n"
-           "representing that file.  The attributes of the port are\n"
-           "determined by the @var{mode} string.  The way in which this is\n"
-           "interpreted is similar to C stdio.  The first character must be\n"
-           "one of the following:\n"
-           "@table @samp\n"
-           "@item r\n"
-           "Open an existing file for input.\n"
-           "@item w\n"
-           "Open a file for output, creating it if it doesn't already exist\n"
-           "or removing its contents if it does.\n"
-           "@item a\n"
-           "Open a file for output, creating it if it doesn't already\n"
-           "exist.  All writes to the port will go to the end of the file.\n"
-           "The \"append mode\" can be turned off while the port is in use\n"
-           "@pxref{Ports and File Descriptors, fcntl}\n"
-           "@end table\n"
-           "The following additional characters can be appended:\n"
-           "@table @samp\n"
-           "@item b\n"
-           "Open the underlying file in binary mode, if supported by the 
system.\n"
-           "Also, open the file using the binary-compatible character 
encoding\n"
-           "\"ISO-8859-1\", ignoring the default port encoding.\n"
-           "@item +\n"
-           "Open the port for both input and output.  E.g., @code{r+}: open\n"
-           "an existing file for both input and output.\n"
-           "@item 0\n"
-           "Create an \"unbuffered\" port.  In this case input and output\n"
-           "operations are passed directly to the underlying port\n"
-           "implementation without additional buffering.  This is likely to\n"
-           "slow down I/O operations.  The buffering mode can be changed\n"
-           "while a port is in use @pxref{Ports and File Descriptors,\n"
-           "setvbuf}\n"
-           "@item l\n"
-           "Add line-buffering to the port.  The port output buffer will be\n"
-           "automatically flushed whenever a newline character is written.\n"
-           "@end table\n"
-           "In theory we could create read/write ports which were buffered\n"
-           "in one direction only.  However this isn't included in the\n"
-           "current interfaces.  If a file cannot be opened with the access\n"
-           "requested, @code{open-file} throws an exception.")
-#define FUNC_NAME s_scm_open_file
+SCM
+scm_open_file_with_encoding (SCM filename, SCM mode,
+                             SCM guess_encoding, SCM encoding)
+#define FUNC_NAME "open-file"
 {
   SCM port;
   int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
+  if (SCM_UNLIKELY (!scm_is_bool (guess_encoding)))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, guess_encoding,
+                            "guess-encoding to be boolean");
+
+  if (SCM_UNLIKELY (!(scm_is_false (encoding) || scm_is_string (encoding))))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, encoding,
+                            "encoding to be string or false");
+
   scm_dynwind_begin (0);
 
   file = scm_to_locale_string (filename);
@@ -446,8 +417,43 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
                              fport_canonicalize_filename (filename));
 
   if (binary)
-    /* Use the binary-friendly ISO-8859-1 encoding. */
-    scm_i_set_port_encoding_x (port, NULL);
+    {
+      if (scm_is_true (encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Encoding specified on a binary port",
+                        scm_list_1 (encoding));
+      if (scm_is_true (guess_encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Request to guess encoding on a binary port",
+                        SCM_EOL);
+
+      /* Use the binary-friendly ISO-8859-1 encoding. */
+      scm_i_set_port_encoding_x (port, NULL);
+    }
+  else
+    {
+      char *enc = NULL;
+
+      if (scm_is_true (guess_encoding))
+        {
+          if (SCM_INPUT_PORT_P (port))
+            enc = scm_i_scan_for_encoding (port);
+          else
+            scm_misc_error (FUNC_NAME,
+                            "Request to guess encoding on an output-only port",
+                            SCM_EOL);
+        }
+
+      if (!enc && scm_is_true (encoding))
+        {
+          char *buf = scm_to_latin1_string (encoding);
+          enc = scm_gc_strdup (buf, "encoding");
+          free (buf);
+        }
+
+      if (enc)
+        scm_i_set_port_encoding_x (port, enc);
+    }
 
   scm_dynwind_end ();
 
@@ -455,6 +461,75 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+SCM
+scm_open_file (SCM filename, SCM mode)
+{
+  return scm_open_file_with_encoding (filename, mode, SCM_BOOL_F, SCM_BOOL_F);
+}
+
+/* We can't define these using SCM_KEYWORD, because keywords have not
+   yet been initialized when scm_init_fports is called.  */
+SCM k_guess_encoding = SCM_UNDEFINED;
+SCM k_encoding       = SCM_UNDEFINED;
+
+SCM_DEFINE (scm_i_open_file, "open-file", 2, 0, 1,
+           (SCM filename, SCM mode, SCM keyword_args),
+           "Open the file whose name is @var{filename}, and return a port\n"
+           "representing that file.  The attributes of the port are\n"
+           "determined by the @var{mode} string.  The way in which this is\n"
+           "interpreted is similar to C stdio.  The first character must be\n"
+           "one of the following:\n"
+           "@table @samp\n"
+           "@item r\n"
+           "Open an existing file for input.\n"
+           "@item w\n"
+           "Open a file for output, creating it if it doesn't already exist\n"
+           "or removing its contents if it does.\n"
+           "@item a\n"
+           "Open a file for output, creating it if it doesn't already\n"
+           "exist.  All writes to the port will go to the end of the file.\n"
+           "The \"append mode\" can be turned off while the port is in use\n"
+           "@pxref{Ports and File Descriptors, fcntl}\n"
+           "@end table\n"
+           "The following additional characters can be appended:\n"
+           "@table @samp\n"
+           "@item b\n"
+           "Open the underlying file in binary mode, if supported by the 
system.\n"
+           "Also, open the file using the binary-compatible character 
encoding\n"
+           "\"ISO-8859-1\", ignoring the default port encoding.\n"
+           "@item +\n"
+           "Open the port for both input and output.  E.g., @code{r+}: open\n"
+           "an existing file for both input and output.\n"
+           "@item 0\n"
+           "Create an \"unbuffered\" port.  In this case input and output\n"
+           "operations are passed directly to the underlying port\n"
+           "implementation without additional buffering.  This is likely to\n"
+           "slow down I/O operations.  The buffering mode can be changed\n"
+           "while a port is in use @pxref{Ports and File Descriptors,\n"
+           "setvbuf}\n"
+           "@item l\n"
+           "Add line-buffering to the port.  The port output buffer will be\n"
+           "automatically flushed whenever a newline character is written.\n"
+           "@end table\n"
+           "In theory we could create read/write ports which were buffered\n"
+           "in one direction only.  However this isn't included in the\n"
+           "current interfaces.  If a file cannot be opened with the access\n"
+           "requested, @code{open-file} throws an exception.")
+#define FUNC_NAME s_scm_i_open_file
+{
+  SCM encoding = SCM_BOOL_F;
+  SCM guess_encoding = SCM_BOOL_F;
+
+  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
+                                k_guess_encoding, &guess_encoding,
+                                k_encoding, &encoding,
+                                SCM_UNDEFINED);
+
+  return scm_open_file_with_encoding (filename, mode,
+                                      guess_encoding, encoding);
+}
+#undef FUNC_NAME
+

 /* Building Guile ports from a file descriptor.  */
 
@@ -805,6 +880,15 @@ scm_make_fptob ()
   return tc;
 }
 
+/* We can't initialize the keywords from 'scm_init_fports', because
+   keywords haven't yet been initialized at that point.  */
+void
+scm_init_fports_keywords ()
+{
+  k_guess_encoding = scm_from_latin1_keyword ("guess-encoding");
+  k_encoding       = scm_from_latin1_keyword ("encoding");
+}
+
 void
 scm_init_fports ()
 {
diff --git a/libguile/fports.h b/libguile/fports.h
index cbef0f8..c32ed95 100644
--- a/libguile/fports.h
+++ b/libguile/fports.h
@@ -51,9 +51,12 @@ SCM_API scm_t_bits scm_tc16_fport;
 SCM_API SCM scm_setbuf0 (SCM port);
 SCM_API SCM scm_setvbuf (SCM port, SCM mode, SCM size);
 SCM_API void scm_evict_ports (int fd);
+SCM_API SCM scm_open_file_with_encoding (SCM filename, SCM modes,
+                                         SCM guess_encoding, SCM encoding);
 SCM_API SCM scm_open_file (SCM filename, SCM modes);
 SCM_API SCM scm_fdes_to_port (int fdes, char *mode, SCM name);
 SCM_API SCM scm_file_port_p (SCM obj);
+SCM_INTERNAL void scm_init_fports_keywords (void);
 SCM_INTERNAL void scm_init_fports (void);
 
 /* internal functions */
diff --git a/libguile/init.c b/libguile/init.c
index 57e4902..455a772 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -444,6 +444,7 @@ scm_i_init_guile (void *base)
   scm_init_gettext ();
   scm_init_ioext ();
   scm_init_keywords ();    /* Requires smob_prehistory */
+  scm_init_fports_keywords ();
   scm_init_list ();
   scm_init_random ();      /* Requires smob_prehistory */
   scm_init_macros ();      /* Requires smob_prehistory and random */
diff --git a/module/ice-9/boot-9.scm b/module/ice-9/boot-9.scm
index 8461ee8..4fdfe1e 100644
--- a/module/ice-9/boot-9.scm
+++ b/module/ice-9/boot-9.scm
@@ -753,6 +753,116 @@ information is unavailable."

 
 ;;;
+;;; Enhanced file opening procedures
+;;;
+
+(define* (open-input-file
+          str #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "Takes a string naming an existing file and returns an input port
+capable of delivering characters from the file.  If the file
+cannot be opened, an error is signalled."
+  (open-file str (if binary "rb" "r")
+             #:encoding encoding
+             #:guess-encoding guess-encoding))
+
+(define* (open-output-file str #:key (binary #f) (encoding #f))
+  "Takes a string naming an output file to be created and returns an
+output port capable of writing characters to a new file by that
+name.  If the file cannot be opened, an error is signalled.  If a
+file with the given name already exists, the effect is unspecified."
+  (open-file str (if binary "wb" "w")
+             #:encoding encoding))
+
+(define* (call-with-input-file
+          str proc #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "PROC should be a procedure of one argument, and STR should be a
+string naming a file.  The file must
+already exist. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-input-file str
+                            #:binary binary
+                            #:encoding encoding
+                            #:guess-encoding guess-encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-input-port p)
+        (apply values vals)))))
+
+(define* (call-with-output-file str proc #:key (binary #f) (encoding #f))
+  "PROC should be a procedure of one argument, and STR should be a
+string naming a file.  The behaviour is unspecified if the file
+already exists. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-output-file str #:binary binary #:encoding encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-output-port p)
+        (apply values vals)))))
+
+(define* (with-input-from-file
+          file thunk #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The file must already exist. The file is opened for
+input, an input port connected to it is made
+the default value returned by `current-input-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-input-file file
+   (lambda (p) (with-input-from-port p thunk))
+   #:binary binary
+   #:encoding encoding
+   #:guess-encoding guess-encoding))
+
+(define* (with-output-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-output-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-output-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+(define* (with-error-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-error-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-error-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+
+
+;;;
 ;;; Extensible exception printing.
 ;;;
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 5d3c213..313cd36 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -24,8 +24,10 @@
   #:use-module (ice-9 popen)
   #:use-module (ice-9 rdelim)
   #:use-module (rnrs bytevectors)
-  #:use-module ((rnrs io ports) #:select (open-bytevector-input-port
-                                          open-bytevector-output-port)))
+  #:use-module ((ice-9 binary-ports) #:select (open-bytevector-input-port
+                                               open-bytevector-output-port
+                                               put-bytevector
+                                               get-bytevector-all)))
 
 (define (display-line . args)
   (for-each display args)
@@ -270,8 +272,8 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file ignores file coding declaration
-(pass-if "file: open-file ignores coding declarations"
+;; open-file ignores file coding declaration by default
+(pass-if "file: open-file ignores coding declaration by default"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
@@ -286,6 +288,287 @@
                    (delete-file filename)
                    (string=? line2 test-string)))))
 
+;; open-input-file with guess-encoding honors coding declaration
+(pass-if "file: open-input-file with guess-encoding honors coding declaration"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+               (let* ((filename (test-file))
+                      (port (open-output-file filename))
+                      (test-string "€100"))
+                 (set-port-encoding! port "iso-8859-15")
+                 (write-line ";; coding: iso-8859-15" port)
+                 (write-line test-string port)
+                 (close-port port)
+                 (let* ((in-port (open-input-file filename
+                                                  #:guess-encoding #t))
+                        (line1 (read-line in-port))
+                        (line2 (read-line in-port)))
+                   (close-port in-port)
+                   (delete-file filename)
+                   (string=? line2 test-string)))))
+
+(with-test-prefix "keyword arguments for file openers"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    (let ((filename (test-file)))
+
+      (with-test-prefix "write #:encoding"
+
+        (pass-if-equal "open-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-file filename "w"
+                                   #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "open-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-output-file filename
+                                          #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "call-with-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (call-with-output-file filename
+              (lambda (port)
+                (display "test" port))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-output-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-output-to-file filename
+              (lambda ()
+                (display "test"))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-error-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-error-to-file
+             filename
+             (lambda ()
+               (display "test" (current-error-port)))
+             #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv)))
+
+      (with-test-prefix "write #:binary"
+
+        (pass-if-equal "open-output-file"
+            "ISO-8859-1"
+          (let* ((port (open-output-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-output-file"
+            "ISO-8859-1"
+          (call-with-output-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-output-to-file"
+            "ISO-8859-1"
+          (with-output-to-file filename
+            (lambda () (port-encoding (current-output-port)))
+            #:binary #t))
+
+        (pass-if-equal "with-error-to-file"
+            "ISO-8859-1"
+          (with-error-to-file
+           filename
+           (lambda () (port-encoding (current-error-port)))
+           #:binary #t)))
+
+      (with-test-prefix "read #:encoding"
+
+        (pass-if-equal "open-file read #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-file filename "r" #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-input-file filename #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (call-with-input-file filename
+            read-string
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (with-input-from-file filename
+            read-string
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "read #:binary"
+
+        (pass-if-equal "open-input-file"
+            "ISO-8859-1"
+          (let* ((port (open-input-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-input-file"
+            "ISO-8859-1"
+          (call-with-input-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-input-from-file"
+            "ISO-8859-1"
+          (with-input-from-file filename
+            (lambda () (port-encoding (current-input-port)))
+            #:binary #t)))
+
+      (with-test-prefix "#:guess-encoding with coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            (lambda (port)
+              (read-line port)
+              (read-line port)
+              (read-line port))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            (lambda ()
+              (read-line)
+              (read-line)
+              (read-line))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "#:guess-encoding without coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15")))
+
+      (delete-file filename))))
+
 ;;; ungetting characters and strings.
 (with-input-from-string "walk on the moon\nmoon"
                        (lambda ()
-- 
1.7.10.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]