bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: patch to avoid huge buffer in scan-skel.l


From: Akim Demaille
Subject: Re: patch to avoid huge buffer in scan-skel.l
Date: Thu, 09 Jan 2003 10:08:24 +0100
User-agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu)

 >> From: Akim Demaille <address@hidden>
 >> Date: Sun, 05 Jan 2003 14:23:47 +0100

 >> | address@hidden       ECHO;
 >> | address@hidden       ECHO;
 >> 
 >> Did you actually observe problems?

 Paul> No.

 Paul> However, I'd rather not have Bison create unbounded buffers
 Paul> without good reason.  

Why?  I mean, we use Flex and we know that it does this properly, so I
saw no reason not to use this feature.  In addition, unboundness here
would refer to an arbitrary long line, which is often considered not
to be the typical case.

 Paul> I didn't observe much of an time penalty, not that I measured
 Paul> it precisely.

Fetching char per char is quite known to cost in Lex scanners.  This
is taken from the documentation of Flex (but I agree anyway that M4
dominates the costs today):

Performance considerations
**************************
[...]
   Another area where the user can increase a scanner's performance
(and one that's easier to implement) arises from the fact that the
longer the tokens matched, the faster the scanner will run.  This is
because with long tokens the processing of most input characters takes
place in the (short) inner scanning loop, and does not often have to go
through the additional work of setting up the scanning environment
(e.g., `yytext') for the action.  Recall the scanner for C comments:

     %x comment
     %%
             int line_num = 1;
     
     "/*"         BEGIN(comment);
     
     
     <comment>[^*\n]*
     <comment>"*"+[^*/\n]*
     <comment>\n             ++line_num;
     <comment>"*"+"/"        BEGIN(INITIAL);

   This could be sped up by writing it as:

     %x comment
     %%
             int line_num = 1;
     
     "/*"         BEGIN(comment);
     
     <comment>[^*\n]*
     <comment>[^*\n]*\n      ++line_num;
     <comment>"*"+[^*/\n]*
     <comment>"*"+[^*/\n]*\n ++line_num;
     <comment>"*"+"/"        BEGIN(INITIAL);
   
   Now instead of each newline requiring the processing of another
action, recognizing the newlines is "distributed" over the other rules
to keep the matched text as long as possible.  Note that _adding_ rules
does _not_ slow down the scanner!  The speed of the scanner is
independent of the number of rules or (modulo the considerations given
at the beginning of this section) how complicated the rules are with
regard to operators such as '*' and '|'.


 ----------------------------------------

One final observation: matching char per char instead of bigger chunks
makes debugging the scanner much less easy (try bison
--trace=skeleton).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]