[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Extracting subexpressions, and performance considerations
From: |
Simon Richter |
Subject: |
Extracting subexpressions, and performance considerations |
Date: |
Tue, 25 Jun 2019 12:36:59 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
I'm trying to build a logfile parser. The humble beginning is
%option noyywrap
SPACE \x20
LPAREN \x28
RPAREN \x29
COLON \x3a
LBRACKET \x5b
RBRACKET \x5d
PATH [_A-Za-z0-9:\\.-]+
INTEGER ([1-9][0-9]*|0)
SEVERITY (note|warning|error)
MESSAGE [^\n]*
MSVC_TAG [A-Z]+[1-9][0-9]*
MSVC_PROJECT {LBRACKET}{PATH}{RBRACKET}
%%
\n* /* ignore */
{PATH}{LPAREN}{INTEGER}{RPAREN}{COLON}{SPACE}*{SEVERITY}{SPACE}*{MSVC_TAG}{COLON}{SPACE}*{MESSAGE}{SPACE}{MSVC_PROJECT}\n
ECHO;
. /* ignore */
%%
This runs very slowly, only a few 100 kB/s on an E5 at 2.1GHz, which seems
related to the negative character class for MESSAGE -- restricting the
character set here speeds things up considerably. Is that a known
restriction, or have I stumbled on a bug here?
I'd also like to split up the line afterwards, but only if it matched as a
whole. The manual seems to suggest using a separate exclusive state and
yyless(0) to reparse, is there a better way to extract subexpressions?
Simon
- Extracting subexpressions, and performance considerations,
Simon Richter <=