lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] rate_table_tool: merge a whole directory


From: Greg Chicares
Subject: [lmi] rate_table_tool: merge a whole directory
Date: Wed, 23 Nov 2016 01:29:23 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.3.0

Proprietary rate tables are distributed along with our free software.
These tables can be stored in two interconvertible formats:
 - binary: not human-readable, which is good for distribution to
   outsiders who should be discouraged from extracting proprietary
   data, but inconvenient for internal maintenance; and
 - human-readable text, which is better for maintenance but not for
   external distribution.
Our plan is to store tables as text in a proprietary git database,
and use 'rate_table_tool' to convert them to binary for distribution.

For this purpose, it seems like a good idea to make 'rate_table_tool'
accept '--merge=/some/directory'. That's easy to do with boost's
directory_iterator, and I will commit this change soon. Here, I want
to explain the rationale.

We already have an 'extract-all' command that is convenient and fast:

$time wine /opt/lmi/bin/rate_table_tool --accept --file=proprietary 
--extract-all   
Extracted 594 tables.
wine /opt/lmi/bin/rate_table_tool --accept --file=proprietary --extract-all
2.66s user 0.22s system 96% cpu 2.979 total

This inverse works with HEAD, but twelve minutes is preposterous
(it might turn out to run faster on msw because 'wine' doesn't
need to be started hundreds of times, but I don't use msw):

/tmp[0]$rm eraseme.dat eraseme.ndx
/tmp[0]$time (for z in *.txt; do wine /opt/lmi/bin/rate_table_tool --accept 
--file=proprietary --merge=$z; done)
( for z in *.txt; do; wine /opt/lmi/bin/rate_table_tool --accept  --merge=$z;
693.36s user 27.10s system 98% cpu 12:12.52 total

With the planned code change, this command produces the same database
and is orders of magnitude faster:

/tmp[0]$rm eraseme2.dat eraseme2.ndx
/tmp[0]$time wine /opt/lmi/bin/rate_table_tool.exe --accept --file=eraseme2 
--merge=. 
wine /opt/lmi/bin/rate_table_tool.exe --accept --file=eraseme2 --merge=.
1.74s user 0.10s system 96% cpu 1.914 total

I wrote and discarded an experimental implementation that accomplished
the same purpose in this fashion:

/tmp[0]$time wine /opt/lmi/bin/rate_table_tool.exe --accept --file=proprietary 
--merge='1.txt 2.txt'

because it seemed less clear, and was more cumbersome with hundreds of
tables.

The only interesting parts of the patch are:

+/// If 'path_to_merge' names a file, then merge that file. If it names
+/// a directory, then merge all '*.txt' files in that directory.
+
 void merge
...
+    if(fs::is_directory(path_to_merge))
+        {
+        fs::directory_iterator i(path_to_merge);
+        fs::directory_iterator const eod;
+        for(; i != eod; ++i)
+            {
+            if(".txt" != fs::extension(*i)) continue;
+            table const& t = table::read_from_text(*i);
+            table_file->add_or_replace_table(t);
+            }
+        }

The ".txt" extension is hard-coded in the program already, so relying
on it here doesn't decrease generality. In practice, we'll run this in
a temporary directory created on the fly (/tmp/tables/ e.g.), so we
needn't worry about stray "txt" files that don't represent tables.

The filesystem TS proposed for standardization already includes
directory_iterator, so I don't think using it here makes it harder to
replace boost with a standard library eventually.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]