bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] Seekable parsers


From: Jose E. Marchesi
Subject: Re: [bug-recutils] Seekable parsers
Date: Thu, 24 May 2012 22:03:58 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)

Hi.
    
    (0) passing a file to the parser and seeking it from other code
    
    (1) making a new parser for each record and passing it a memory buffer
        starting at an appropriate point
    
    (2) adding parser interfaces to seek it to another place in the file or
        memory buffer.
    
    I think the (0) solution is not elegant enough and could lead to bugs in
    nonobvious cases.  (1) might be slower than (2).  I think (2) won't be
    difficult to implement or test, so I'll implement it unless you
    recommend a different solution.

(2) is definitely the way to go.
    
    If we use mmap to read the recfile, parsers would need an additional
    interface to not read past the end of the file, e.g. like this one:
    
    /* Create a parser associated with a given buffer that will be used as
       the source for the tokens.  The buffer is of specified size and
       doesn't have to be null-terminated.  If not enough memory, return
       NULL.  */
    
    rec_parser_t rec_parser_new_mem (const char *buffer, size_t size,
                                     const char *source);

That is ok.  But since the special case where SOURCE is NULL-terminated
can be easily handled using strlen (source) as the SIZE argument, I
would not introduce a new function.  Just rename rec_parser_new_str into
rec_parser_new_mem.

But then, is using mmap the best option here?  An alternative would be
to expand the fopen-based parser backend in order to use fseek/ftell.
The FILE* functions have less portability issues that mmap, and the
parser is character-oriented anyway.

    Another problem is keeping line numbers correct when using any of the
    above three solutions.  This can be solved by adding another function to
    set the line number, or (for (2)) to set it with the position in file.
    
    This interface could be used for changing parser position in (2):
    
    /* Change the position in file of the parser.  The line number is only
       used to store it in the parsed records. */
    
    void rec_parser_seek (rec_parser_t parser, size_t line_number,
                          size_t position);

Yes, it is good to force the user to specify the new line number.

-- 
Jose E. Marchesi         http://www.jemarch.net
GNU Project              http://www.gnu.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]