[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pspp-cvs] pspp doc/portable-file-format.texi src/data/Cha...
From: |
Ben Pfaff |
Subject: |
[Pspp-cvs] pspp doc/portable-file-format.texi src/data/Cha... |
Date: |
Sun, 29 Jul 2007 05:40:52 +0000 |
CVSROOT: /cvsroot/pspp
Module name: pspp
Changes by: Ben Pfaff <blp> 07/07/29 05:40:52
Modified files:
doc : portable-file-format.texi
src/data : ChangeLog por-file-reader.c por-file-writer.c
Log message:
Make PSPP able to read all the portable files I could find on the
web.
* por-file-reader.c (struct pfm_reader): New member `line_length'.
(error): Print file offset in hexadecimal.
(warning): New function.
(advance): Treat lines less than 80 bytes long as padded to 80
bytes with spaces.
(pfm_open_reader): Call read_documents if we find an "E" record.
(convert_format): Convert invalid formats to the default format
instead of aborting reading the file.
(read_variables): Rename duplicate variable names instead of
aborting reading the file.
(read_value_label): Allow string variables of different widths to
be assigned value labels in the same record. Replace duplicate
value labels instead of aborting.
(read_documents): New function.
* por-file-writer.c (pfm_open_writer): Call write_documents if the
dictionary has documents.
(write_documents): New function.
CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/pspp/doc/portable-file-format.texi?cvsroot=pspp&r1=1.3&r2=1.4
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/ChangeLog?cvsroot=pspp&r1=1.145&r2=1.146
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/por-file-reader.c?cvsroot=pspp&r1=1.20&r2=1.21
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/por-file-writer.c?cvsroot=pspp&r1=1.15&r2=1.16
Patches:
Index: doc/portable-file-format.texi
===================================================================
RCS file: /cvsroot/pspp/pspp/doc/portable-file-format.texi,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -b -r1.3 -r1.4
--- doc/portable-file-format.texi 6 Jun 2007 05:17:48 -0000 1.3
+++ doc/portable-file-format.texi 29 Jul 2007 05:40:51 -0000 1.4
@@ -22,17 +22,24 @@
* Case Weight Variable Record::
* Variable Records::
* Value Label Records::
+* Portable File Document Record::
* Portable File Data::
@end menu
@node Portable File Characters
@section Portable File Characters
-Portable files are arranged as a series of lines of exactly 80
+Portable files are arranged as a series of lines of 80
characters each. Each line is terminated by a carriage-return,
-line-feed sequence ``new-lines''). New-lines are only used to avoid
+line-feed sequence (``new-lines''). New-lines are only used to avoid
line length limits imposed by some OSes; they are not meaningful.
+Most lines in portable files are exactly 80 characters long. The only
+exception is a line that ends in one or more spaces, in which the
+spaces may optionally be omitted. Thus, a portable file reader must
+act as though a line shorter than 80 characters is padded to that
+length with spaces.
+
The file must be terminated with a @samp{Z} character. In addition, if
the final line in the file does not have exactly 80 characters, then it
is padded on the right with @samp{Z} characters. (The file contents may
@@ -81,6 +88,9 @@
Value labels (optional).
@item
+Documents (optional).
+
address@hidden
Data.
@end itemize
@@ -369,6 +379,11 @@
@item
Name (string). 1--8 characters long. Must be in all capitals.
+A few portable files that contain duplicate variable names have been
+spotted in the wild. PSPP handles these by renaming the duplicates
+with numeric extensions: @address@hidden, @address@hidden, and
+so on.
+
@item
Print format. This is a set of three integer fields:
@@ -384,6 +399,11 @@
Number of decimal places. 1--40.
@end itemize
+A few portable files with invalid format types or formats that are not
+of the appropriate width for their variables have been spotted in the
+wild. PSPP assigns a default F or A format to a variable with an
+invalid format.
+
@item
Write format. Same structure as the print format described above.
@end itemize
@@ -420,7 +440,8 @@
@item
List of variables (strings). The variable count specifies the number in
the list. Variables are specified by their names. All variables must
-be of the same type (numeric or string).
+be of the same type (numeric or string), but string variables do not
+necessarily have the same width.
@item
Label count (integer).
@@ -431,6 +452,20 @@
appropriate to the variables, followed by a label (string).
@end itemize
+A few portable files that specify duplicate value labels, that is, two
+different labels for a single value of a single variable, have been
+spotted in the wild. PSPP uses the last value label specified in
+these cases.
+
address@hidden Portable File Document Record
address@hidden Document Record
+
+One document record may optionally follow the value label record. The
+document record consists of tag code @samp{E}, following by the number
+of document lines as an integer, followed by that number of strings,
+each of which represents one document line. Document lines must be 80
+bytes long or shorter.
+
@node Portable File Data
@section Portable File Data
Index: src/data/ChangeLog
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/ChangeLog,v
retrieving revision 1.145
retrieving revision 1.146
diff -u -b -r1.145 -r1.146
--- src/data/ChangeLog 27 Jul 2007 22:58:02 -0000 1.145
+++ src/data/ChangeLog 29 Jul 2007 05:40:51 -0000 1.146
@@ -1,3 +1,26 @@
+2007-07-28 Ben Pfaff <address@hidden>
+
+ Make PSPP able to read all the portable files I could find on the
+ web. Thanks to John Darrington for review. Bug #17620.
+ * por-file-reader.c (struct pfm_reader): New member `line_length'.
+ (error): Print file offset in hexadecimal.
+ (warning): New function.
+ (advance): Treat lines less than 80 bytes long as padded to 80
+ bytes with spaces.
+ (pfm_open_reader): Call read_documents if we find an "E" record.
+ (convert_format): Convert invalid formats to the default format
+ instead of aborting reading the file.
+ (read_variables): Rename duplicate variable names instead of
+ aborting reading the file.
+ (read_value_label): Allow string variables of different widths to
+ be assigned value labels in the same record. Replace duplicate
+ value labels instead of aborting.
+ (read_documents): New function.
+
+ * por-file-writer.c (pfm_open_writer): Call write_documents if the
+ dictionary has documents.
+ (write_documents): New function.
+
2007-07-25 Ben Pfaff <address@hidden>
Fix bugs related to bug #17213.
Index: src/data/por-file-reader.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/por-file-reader.c,v
retrieving revision 1.20
retrieving revision 1.21
diff -u -b -r1.20 -r1.21
--- src/data/por-file-reader.c 7 Jul 2007 06:14:09 -0000 1.20
+++ src/data/por-file-reader.c 29 Jul 2007 05:40:51 -0000 1.21
@@ -65,6 +65,7 @@
struct file_handle *fh; /* File handle. */
FILE *file; /* File stream. */
+ int line_length; /* Number of characters so far on this line. */
char cc; /* Current character. */
char *trans; /* 256-byte character set translation table. */
int var_cnt; /* Number of variables. */
@@ -91,7 +92,7 @@
va_list args;
ds_init_empty (&text);
- ds_put_format (&text, _("portable file %s corrupt at offset %ld: "),
+ ds_put_format (&text, _("portable file %s corrupt at offset 0x%lx: "),
fh_get_file_name (r->fh), ftell (r->file));
va_start (args, msg);
ds_put_vformat (&text, msg, args);
@@ -110,6 +111,31 @@
longjmp (r->bail_out, 1);
}
+/* Displays MSG as an warning for the current position in
+ portable file reader R. */
+static void
+warning (struct pfm_reader *r, const char *msg, ...)
+{
+ struct msg m;
+ struct string text;
+ va_list args;
+
+ ds_init_empty (&text);
+ ds_put_format (&text, _("reading portable file %s at offset 0x%lx: "),
+ fh_get_file_name (r->fh), ftell (r->file));
+ va_start (args, msg);
+ ds_put_vformat (&text, msg, args);
+ va_end (args);
+
+ m.category = MSG_GENERAL;
+ m.severity = MSG_WARNING;
+ m.where.file_name = NULL;
+ m.where.line_number = 0;
+ m.text = ds_cstr (&text);
+
+ msg_emit (&m);
+}
+
/* Closes portable file reader R, after we're done with it. */
static void
por_file_casereader_destroy (struct casereader *reader UNUSED, void *r_)
@@ -124,14 +150,33 @@
{
int c;
- while ((c = getc (r->file)) == '\r' || c == '\n')
+ /* Read the next character from the file.
+ Ignore carriage returns entirely.
+ Mostly ignore new-lines, but if a new-line occurs before the
+ line has reached 80 bytes in length, then treat the
+ "missing" bytes as spaces. */
+ for (;;)
+ {
+ while ((c = getc (r->file)) == '\r')
continue;
+ if (c != '\n')
+ break;
+
+ if (r->line_length < 80)
+ {
+ c = ' ';
+ ungetc ('\n', r->file);
+ break;
+ }
+ r->line_length = 0;
+ }
if (c == EOF)
error (r, _("unexpected end of file"));
if (r->trans != NULL)
c = r->trans[c];
r->cc = c;
+ r->line_length++;
}
/* Skip a single character if present, and return whether it was
@@ -152,7 +197,7 @@
static void read_version_data (struct pfm_reader *, struct pfm_read_info *);
static void read_variables (struct pfm_reader *, struct dictionary *);
static void read_value_label (struct pfm_reader *, struct dictionary *);
-void dump_dictionary (struct dictionary *);
+static void read_documents (struct pfm_reader *, struct dictionary *);
/* Reads the dictionary from file with handle H, and returns it in a
dictionary structure. This dictionary may be modified in order to
@@ -176,6 +221,7 @@
goto error;
r->fh = fh;
r->file = pool_fopen (r->pool, fh_get_file_name (r->fh), "rb");
+ r->line_length = 0;
r->weight_index = -1;
r->trans = NULL;
r->var_cnt = 0;
@@ -201,6 +247,10 @@
while (match (r, 'D'))
read_value_label (r, *dict);
+ /* Read documents. */
+ if (match (r, 'E'))
+ read_documents (r, *dict);
+
/* Check that we've made it to the data. */
if (!match (r, 'F'))
error (r, _("Data record expected."));
@@ -469,14 +519,20 @@
checking that the format is appropriate for variable V. */
static struct fmt_spec
convert_format (struct pfm_reader *r, const int portable_format[3],
- struct variable *v)
+ struct variable *v, bool *report_error)
{
struct fmt_spec format;
bool ok;
if (!fmt_from_io (portable_format[0], &format.type))
- error (r, _("%s: Bad format specifier byte (%d)."),
+ {
+ if (*report_error)
+ warning (r, _("%s: Bad format specifier byte (%d). Variable "
+ "will be assigned a default format."),
var_get_name (v), portable_format[0]);
+ goto assign_default;
+ }
+
format.w = portable_format[1];
format.d = portable_format[2];
@@ -487,14 +543,27 @@
if (!ok)
{
+ if (*report_error)
+ {
char fmt_string[FMT_STRING_LEN_MAX + 1];
- error (r, _("%s variable %s has invalid format specifier %s."),
- var_is_numeric (v) ? _("Numeric") : _("String"),
- var_get_name (v), fmt_to_string (&format, fmt_string));
- format = fmt_default_for_width (var_get_width (v));
+ fmt_to_string (&format, fmt_string);
+ if (var_is_numeric (v))
+ warning (r, _("Numeric variable %s has invalid format "
+ "specifier %s."),
+ var_get_name (v), fmt_string);
+ else
+ warning (r, _("String variable %s with width %d has "
+ "invalid format specifier %s."),
+ var_get_name (v), var_get_width (v), fmt_string);
+ }
+ goto assign_default;
}
return format;
+
+assign_default:
+ *report_error = false;
+ return fmt_default_for_width (var_get_width (v));
}
static union value parse_value (struct pfm_reader *, struct variable *);
@@ -532,6 +601,7 @@
struct variable *v;
struct missing_values miss;
struct fmt_spec print, write;
+ bool report_error = true;
int j;
if (!match (r, '7'))
@@ -547,7 +617,7 @@
fmt[j] = read_int (r);
if (!var_is_valid_name (name, false) || *name == '#' || *name == '$')
- error (r, _("position %d: Invalid variable name `%s'."), i, name);
+ error (r, _("Invalid variable name `%s' in position %d."), name, i);
str_uppercase (name);
if (width < 0 || width > 255)
@@ -555,10 +625,24 @@
v = dict_create_var (dict, name, width);
if (v == NULL)
- error (r, _("Duplicate variable name %s."), name);
+ {
+ int i;
+ for (i = 1; i < 100000; i++)
+ {
+ char try_name[LONG_NAME_LEN + 1];
+ sprintf (try_name, "%.*s_%d", LONG_NAME_LEN - 6, name, i);
+ v = dict_create_var (dict, try_name, width);
+ if (v != NULL)
+ break;
+ }
+ if (v == NULL)
+ error (r, _("Duplicate variable name %s in position %d."), name,
i);
+ warning (r, _("Duplicate variable name %s in position %d renamed "
+ "to %s."), name, i, var_get_name (v));
+ }
- print = convert_format (r, &fmt[0], v);
- write = convert_format (r, &fmt[3], v);
+ print = convert_format (r, &fmt[0], v, &report_error);
+ write = convert_format (r, &fmt[3], v, &report_error);
var_set_print_format (v, &print);
var_set_write_format (v, &write);
@@ -645,9 +729,9 @@
if (v[i] == NULL)
error (r, _("Unknown variable %s while parsing value labels."), name);
- if (var_get_width (v[0]) != var_get_width (v[i]))
+ if (var_get_type (v[0]) != var_get_type (v[i]))
error (r, _("Cannot assign value labels to %s and %s, which "
- "have different variable types or widths."),
+ "have different variable types."),
var_get_name (v[0]), var_get_name (v[i]));
}
@@ -661,21 +745,30 @@
val = parse_value (r, v[0]);
read_string (r, label);
- /* Assign the value_label's to each variable. */
+ /* Assign the value label to each variable. */
for (j = 0; j < nv; j++)
{
struct variable *var = v[j];
- if (!var_add_value_label (var, &val, label))
- continue;
-
- if (var_is_numeric (var))
- error (r, _("Duplicate label for value %g for variable %s."),
- val.f, var_get_name (var));
- else
- error (r, _("Duplicate label for value `%.*s' for variable %s."),
- var_get_width (var), val.s, var_get_name (var));
+ if (!var_is_long_string (var))
+ var_replace_value_label (var, &val, label);
+ }
}
+}
+
+/* Reads a set of documents from portable file R into DICT. */
+static void
+read_documents (struct pfm_reader *r, struct dictionary *dict)
+{
+ int line_cnt;
+ int i;
+
+ line_cnt = read_int (r);
+ for (i = 0; i < line_cnt; i++)
+ {
+ char line[256];
+ read_string (r, line);
+ dict_add_document_line (dict, line);
}
}
Index: src/data/por-file-writer.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/por-file-writer.c,v
retrieving revision 1.15
retrieving revision 1.16
diff -u -b -r1.15 -r1.16
--- src/data/por-file-writer.c 25 Jul 2007 04:09:44 -0000 1.15
+++ src/data/por-file-writer.c 29 Jul 2007 05:40:52 -0000 1.16
@@ -83,6 +83,8 @@
static void write_variables (struct pfm_writer *, struct dictionary *);
static void write_value_labels (struct pfm_writer *,
const struct dictionary *);
+static void write_documents (struct pfm_writer *,
+ const struct dictionary *);
static void format_trig_double (long double, int base_10_precision, char[]);
static char *format_trig_int (int, bool force_sign, char[]);
@@ -159,6 +161,8 @@
write_version_data (w);
write_variables (w, dict);
write_value_labels (w, dict);
+ if (dict_get_document_line_cnt (dict) > 0)
+ write_documents (w, dict);
buf_write (w, "F", 1);
if (ferror (w->file))
goto error;
@@ -414,7 +418,25 @@
}
}
-/* Writes case C to the portable file represented by H. */
+/* Write documents in DICT to portable file W. */
+static void
+write_documents (struct pfm_writer *w, const struct dictionary *dict)
+{
+ size_t line_cnt = dict_get_document_line_cnt (dict);
+ struct string line = DS_EMPTY_INITIALIZER;
+ int i;
+
+ buf_write (w, "E", 1);
+ write_int (w, line_cnt);
+ for (i = 0; i < line_cnt; i++)
+ {
+ dict_get_document_line (dict, i, &line);
+ write_string (w, ds_cstr (&line));
+ }
+ ds_destroy (&line);
+}
+
+/* Writes case C to the portable file represented by WRITER. */
static void
por_file_casewriter_write (struct casewriter *writer, void *w_,
struct ccase *c)
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Pspp-cvs] pspp doc/portable-file-format.texi src/data/Cha...,
Ben Pfaff <=