help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extracting subexpressions, and performance considerations


From: Simon Richter
Subject: Extracting subexpressions, and performance considerations
Date: Tue, 25 Jun 2019 12:36:59 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

I'm trying to build a logfile parser. The humble beginning is

%option noyywrap
SPACE           \x20
LPAREN          \x28
RPAREN          \x29
COLON           \x3a
LBRACKET        \x5b
RBRACKET        \x5d
PATH            [_A-Za-z0-9:\\.-]+
INTEGER         ([1-9][0-9]*|0)
SEVERITY        (note|warning|error)
MESSAGE         [^\n]*
MSVC_TAG        [A-Z]+[1-9][0-9]*
MSVC_PROJECT    {LBRACKET}{PATH}{RBRACKET}
%%
\n*                     /* ignore */
{PATH}{LPAREN}{INTEGER}{RPAREN}{COLON}{SPACE}*{SEVERITY}{SPACE}*{MSVC_TAG}{COLON}{SPACE}*{MESSAGE}{SPACE}{MSVC_PROJECT}\n
       ECHO;
.                       /* ignore */
%%

This runs very slowly, only a few 100 kB/s on an E5 at 2.1GHz, which seems
related to the negative character class for MESSAGE -- restricting the
character set here speeds things up considerably. Is that a known
restriction, or have I stumbled on a bug here?

I'd also like to split up the line afterwards, but only if it matched as a
whole. The manual seems to suggest using a separate exclusive state and
yyless(0) to reparse, is there a better way to extract subexpressions?

   Simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]