[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to grep/manual/html_node/Performance.html,v
From: |
Jim Meyering |
Subject: |
Changes to grep/manual/html_node/Performance.html,v |
Date: |
Wed, 22 Mar 2023 22:55:26 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 23/03/22 22:55:22
Index: html_node/Performance.html
===================================================================
RCS file: /webcvs/grep/grep/manual/html_node/Performance.html,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -b -r1.5 -r1.6
--- html_node/Performance.html 3 Sep 2022 19:33:14 -0000 1.5
+++ html_node/Performance.html 23 Mar 2023 02:55:21 -0000 1.6
@@ -1,11 +1,11 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<!DOCTYPE html>
<html>
-<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
+<!-- Created by GNU Texinfo 7.0dev, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual is for grep, a pattern matching engine.
-Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
+Copyright © 1999-2002, 2005, 2008-2023 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
"GNU Free Documentation License". -->
-<title>Performance (GNU Grep 3.8)</title>
+<title>Performance (GNU Grep 3.10)</title>
-<meta name="description" content="Performance (GNU Grep 3.8)">
-<meta name="keywords" content="Performance (GNU Grep 3.8)">
+<meta name="description" content="Performance (GNU Grep 3.10)">
+<meta name="keywords" content="Performance (GNU Grep 3.10)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
@@ -31,21 +31,9 @@
<link href="Usage.html" rel="prev" title="Usage">
<style type="text/css">
<!--
-a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
-a.summary-letter {text-decoration: none}
-blockquote.indentedblock {margin-right: 0em}
-div.display {margin-left: 3.2em}
-div.example {margin-left: 3.2em}
-kbd {font-style: oblique}
-pre.display {font-family: inherit}
-pre.format {font-family: inherit}
-pre.menu-comment {font-family: serif}
-pre.menu-preformatted {font-family: serif}
-span.nolinebreak {white-space: nowrap}
-span.roman {font-family: initial; font-weight: normal}
-span.sansserif {font-family: sans-serif; font-weight: normal}
-span:hover a.copiable-anchor {visibility: visible}
-ul.no-bullet {list-style: none}
+a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
+span:hover a.copiable-link {visibility: visible}
+ul.mark-bullet {list-style-type: disc}
-->
</style>
<link rel="stylesheet" type="text/css"
href="https://www.gnu.org/software/gnulib/manual.css">
@@ -54,126 +42,126 @@
</head>
<body lang="en">
-<div class="chapter" id="Performance">
-<div class="header">
+<div class="chapter-level-extent" id="Performance">
+<div class="nav-panel">
<p>
Next: <a href="Reporting-Bugs.html" accesskey="n" rel="next">Reporting
bugs</a>, Previous: <a href="Usage.html" accesskey="p" rel="prev">Usage</a>,
Up: <a href="index.html" accesskey="u" rel="up">grep</a> [<a
href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
</div>
<hr>
-<span id="Performance-1"></span><h2 class="chapter">5 Performance</h2>
+<h2 class="chapter" id="Performance-1"><span>5 Performance<a
class="copiable-link" href="#Performance-1"> ¶</a></span></h2>
-<span id="index-performance"></span>
-<p>Typically <code>grep</code> is an efficient way to search text. However,
+<a class="index-entry-id" id="index-performance"></a>
+<p>Typically <code class="command">grep</code> is an efficient way to search
text. However,
it can be quite slow in some cases, and it can search large files
where even minor performance tweaking can help significantly.
-Although the algorithm used by <code>grep</code> is an implementation
+Although the algorithm used by <code class="command">grep</code> is an
implementation
detail that can change from release to release, understanding its
basic strengths and weaknesses can help you improve its performance.
</p>
-<p>The <code>grep</code> command operates partly via a set of automata that
+<p>The <code class="command">grep</code> command operates partly via a set of
automata that
are designed for efficiency, and partly via a slower matcher that
takes over when the fast matchers run into unusual features like
back-references. When feasible, the Boyer–Moore fast string
searching algorithm is used to match a single fixed pattern, and the
Aho–Corasick algorithm is used to match multiple fixed patterns.
</p>
-<span id="index-locales"></span>
-<p>Generally speaking <code>grep</code> operates more efficiently in
+<a class="index-entry-id" id="index-locales"></a>
+<p>Generally speaking <code class="command">grep</code> operates more
efficiently in
single-byte locales, since it can avoid the special processing needed
for multi-byte characters. If your patterns will work just as well
-that way, setting <code>LC_ALL</code> to a single-byte locale can help
-performance considerably. Setting ‘<samp>LC_ALL='C'</samp>’ can be
-particularly efficient, as <code>grep</code> is tuned for that locale.
-</p>
-<span id="index-case-insensitive-search-1"></span>
-<p>Outside the ‘<samp>C</samp>’ locale, case-insensitive search,
and search for
-bracket expressions like ‘<samp>[a-z]</samp>’ and
‘<samp>[[=a=]b]</samp>’, can be
+that way, setting <code class="env">LC_ALL</code> to a single-byte locale can
help
+performance considerably. Setting ‘<samp
class="samp">LC_ALL='C'</samp>’ can be
+particularly efficient, as <code class="command">grep</code> is tuned for that
locale.
+</p>
+<a class="index-entry-id" id="index-case-insensitive-search-1"></a>
+<p>Outside the ‘<samp class="samp">C</samp>’ locale,
case-insensitive search, and search for
+bracket expressions like ‘<samp class="samp">[a-z]</samp>’ and
‘<samp class="samp">[[=a=]b]</samp>’, can be
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
</p>
-<span id="index-interval-expressions-1"></span>
+<a class="index-entry-id" id="index-interval-expressions-1"></a>
<p>Interval expressions may be implemented internally via repetition.
-For example, ‘<samp>^(a|bc){2,4}$</samp>’ might be implemented as
-‘<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>’. A large repetition
count may
+For example, ‘<samp class="samp">^(a|bc){2,4}$</samp>’ might be
implemented as
+‘<samp class="samp">^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>’. A
large repetition count may
exhaust memory or greatly slow matching. Even small counts can cause
-problems if cascaded; for example, ‘<samp>grep -E
+problems if cascaded; for example, ‘<samp class="samp">grep -E
".*{10,}{10,}{10,}{10,}{10,}"</samp>’ is likely to overflow a
stack. Fortunately, regular expressions like these are typically
artificial, and cascaded repetitions do not conform to POSIX so cannot
be used in portable programs anyway.
</p>
-<span id="index-back_002dreferences"></span>
-<p>A back-reference such as ‘<samp>\1</samp>’ can hurt performance
significantly
+<a class="index-entry-id" id="index-back_002dreferences"></a>
+<p>A back-reference such as ‘<samp class="samp">\1</samp>’ can
hurt performance significantly
in some cases, since back-references cannot in general be implemented
via a finite state automaton, and instead trigger a backtracking
algorithm that can be quite inefficient. For example, although the
-pattern ‘<samp>^(.*)\1{14}(.*)\2{13}$</samp>’ matches only lines
whose
-lengths can be written as a sum <em class='math'>15x + 14y</em> for nonnegative
-integers <em class='math'>x</em> and <em class='math'>y</em>, the pattern
matcher does not perform
+pattern ‘<samp class="samp">^(.*)\1{14}(.*)\2{13}$</samp>’ matches
only lines whose
+lengths can be written as a sum <em class="math">15x + 14y</em> for nonnegative
+integers <em class="math">x</em> and <em class="math">y</em>, the pattern
matcher does not perform
linear Diophantine analysis and instead backtracks through all
possible matching strings, using an algorithm that is exponential in
the worst case.
</p>
-<span id="index-holes-in-files"></span>
+<a class="index-entry-id" id="index-holes-in-files"></a>
<p>On some operating systems that support files with holes—large
regions of zeros that are not physically present on secondary
-storage—<code>grep</code> can skip over the holes efficiently without
+storage—<code class="command">grep</code> can skip over the holes
efficiently without
needing to read the zeros. This optimization is not available if the
-<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a
href="File-and-Directory-Selection.html">File and Directory Selection</a>),
unless the <samp>-z</samp> (<samp>--null-data</samp>)
-option is also used (see <a href="Other-Options.html">Other Options</a>).
+<samp class="option">-a</samp> (<samp
class="option">--binary-files=text</samp>) option is used (see <a class="pxref"
href="File-and-Directory-Selection.html">File and Directory Selection</a>),
unless the <samp class="option">-z</samp> (<samp
class="option">--null-data</samp>)
+option is also used (see <a class="pxref" href="Other-Options.html">Other
Options</a>).
</p>
-<span id="index-pipelines-and-reading"></span>
-<p>For efficiency <code>grep</code> does not always read all its input.
-For example, the shell command ‘<samp>sed '/^...$/d' | grep -q
X</samp>’ can
-cause <code>grep</code> to exit immediately after reading a line
-containing ‘<samp>X</samp>’, without bothering to read the rest of
its input data.
-This in turn can cause <code>sed</code> to exit with a nonzero status because
-<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+<a class="index-entry-id" id="index-pipelines-and-reading"></a>
+<p>For efficiency <code class="command">grep</code> does not always read all
its input.
+For example, the shell command ‘<samp class="samp">sed '/^...$/d' | grep
-q X</samp>’ can
+cause <code class="command">grep</code> to exit immediately after reading a
line
+containing ‘<samp class="samp">X</samp>’, without bothering to
read the rest of its input data.
+This in turn can cause <code class="command">sed</code> to exit with a nonzero
status because
+<code class="command">sed</code> cannot write to its output pipe after <code
class="command">grep</code> exits.
</p>
-<p>For more about the algorithms used by <code>grep</code> and about
+<p>For more about the algorithms used by <code class="command">grep</code> and
about
related string matching algorithms, see:
</p>
-<ul>
-<li> Aho AV. Algorithms for finding patterns in strings.
-In: van Leeuwen J. <em>Handbook of Theoretical Computer Science</em>, vol. A.
+<ul class="itemize mark-bullet">
+<li>Aho AV. Algorithms for finding patterns in strings.
+In: van Leeuwen J. <em class="emph">Handbook of Theoretical Computer
Science</em>, vol. A.
New York: Elsevier; 1990. p. 255–300.
This surveys classic string matching algorithms, some of which are
-used by <code>grep</code>.
+used by <code class="command">grep</code>.
-</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to
bibliographic search.
-<em>CACM</em>. 1975;18(6):333–40.
-<a
href="https://doi.org/10.1145/360825.360855">https://doi.org/10.1145/360825.360855</a>.
+</li><li>Aho AV, Corasick MJ. Efficient string matching: an aid to
bibliographic search.
+<em class="emph">CACM</em>. 1975;18(6):333–40.
+<a class="url"
href="https://doi.org/10.1145/360825.360855">https://doi.org/10.1145/360825.360855</a>.
This introduces the Aho–Corasick algorithm.
-</li><li> Boyer RS, Moore JS. A fast string searching algorithm.
-<em>CACM</em>. 1977;20(10):762–72.
-<a
href="https://doi.org/10.1145/359842.359859">https://doi.org/10.1145/359842.359859</a>.
+</li><li>Boyer RS, Moore JS. A fast string searching algorithm.
+<em class="emph">CACM</em>. 1977;20(10):762–72.
+<a class="url"
href="https://doi.org/10.1145/359842.359859">https://doi.org/10.1145/359842.359859</a>.
This introduces the Boyer–Moore algorithm.
-</li><li> Faro S, Lecroq T. The exact online string matching problem: a review
+</li><li>Faro S, Lecroq T. The exact online string matching problem: a review
of the most recent results.
-<em>ACM Comput Surv</em>. 2013;45(2):13.
-<a
href="https://doi.org/10.1145/2431211.2431212">https://doi.org/10.1145/2431211.2431212</a>.
+<em class="emph">ACM Comput Surv</em>. 2013;45(2):13.
+<a class="url"
href="https://doi.org/10.1145/2431211.2431212">https://doi.org/10.1145/2431211.2431212</a>.
This surveys string matching algorithms that might help improve the
-performance of <code>grep</code> in the future.
+performance of <code class="command">grep</code> in the future.
-</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+</li><li>Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
Exact string matching algorithms: survey issues, and future research
directions.
-<em>IEEE Access</em>. 2019;7:69614–37.
-<a
href="https://doi.org/10.1109/ACCESS.2019.2914071">https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+<em class="emph">IEEE Access</em>. 2019;7:69614–37.
+<a class="url"
href="https://doi.org/10.1109/ACCESS.2019.2914071">https://doi.org/10.1109/ACCESS.2019.2914071</a>.
This survey is more recent than Faro & Lecroq,
and focuses on taxonomy instead of performance.
-</li><li> Hume A, Sunday D. Fast string search.
-<em>Software Pract Exper</em>. 1991;21(11):1221–48.
-<a
href="https://doi.org/10.1002/spe.4380211105">https://doi.org/10.1002/spe.4380211105</a>.
+</li><li>Hume A, Sunday D. Fast string search.
+<em class="emph">Software Pract Exper</em>. 1991;21(11):1221–48.
+<a class="url"
href="https://doi.org/10.1002/spe.4380211105">https://doi.org/10.1002/spe.4380211105</a>.
This excellent albeit now-dated survey aided the initial development
-of <code>grep</code>.
+of <code class="command">grep</code>.
</li></ul>
</div>
<hr>
-<div class="header">
+<div class="nav-panel">
<p>
Next: <a href="Reporting-Bugs.html">Reporting bugs</a>, Previous: <a
href="Usage.html">Usage</a>, Up: <a href="index.html">grep</a> [<a
href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
</div>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to grep/manual/html_node/Performance.html,v,
Jim Meyering <=