grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to grep/manual/html_node/Performance.html,v


From: Jim Meyering
Subject: Changes to grep/manual/html_node/Performance.html,v
Date: Wed, 22 Mar 2023 22:55:26 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 23/03/22 22:55:22

Index: html_node/Performance.html
===================================================================
RCS file: /webcvs/grep/grep/manual/html_node/Performance.html,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -b -r1.5 -r1.6
--- html_node/Performance.html  3 Sep 2022 19:33:14 -0000       1.5
+++ html_node/Performance.html  23 Mar 2023 02:55:21 -0000      1.6
@@ -1,11 +1,11 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<!DOCTYPE html>
 <html>
-<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
+<!-- Created by GNU Texinfo 7.0dev, https://www.gnu.org/software/texinfo/ -->
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <!-- This manual is for grep, a pattern matching engine.
 
-Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
+Copyright © 1999-2002, 2005, 2008-2023 Free Software Foundation,
 Inc.
 
 Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
 Texts.  A copy of the license is included in the section entitled
 "GNU Free Documentation License". -->
-<title>Performance (GNU Grep 3.8)</title>
+<title>Performance (GNU Grep 3.10)</title>
 
-<meta name="description" content="Performance (GNU Grep 3.8)">
-<meta name="keywords" content="Performance (GNU Grep 3.8)">
+<meta name="description" content="Performance (GNU Grep 3.10)">
+<meta name="keywords" content="Performance (GNU Grep 3.10)">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
@@ -31,21 +31,9 @@
 <link href="Usage.html" rel="prev" title="Usage">
 <style type="text/css">
 <!--
-a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
-a.summary-letter {text-decoration: none}
-blockquote.indentedblock {margin-right: 0em}
-div.display {margin-left: 3.2em}
-div.example {margin-left: 3.2em}
-kbd {font-style: oblique}
-pre.display {font-family: inherit}
-pre.format {font-family: inherit}
-pre.menu-comment {font-family: serif}
-pre.menu-preformatted {font-family: serif}
-span.nolinebreak {white-space: nowrap}
-span.roman {font-family: initial; font-weight: normal}
-span.sansserif {font-family: sans-serif; font-weight: normal}
-span:hover a.copiable-anchor {visibility: visible}
-ul.no-bullet {list-style: none}
+a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
+span:hover a.copiable-link {visibility: visible}
+ul.mark-bullet {list-style-type: disc}
 -->
 </style>
 <link rel="stylesheet" type="text/css" 
href="https://www.gnu.org/software/gnulib/manual.css";>
@@ -54,126 +42,126 @@
 </head>
 
 <body lang="en">
-<div class="chapter" id="Performance">
-<div class="header">
+<div class="chapter-level-extent" id="Performance">
+<div class="nav-panel">
 <p>
 Next: <a href="Reporting-Bugs.html" accesskey="n" rel="next">Reporting 
bugs</a>, Previous: <a href="Usage.html" accesskey="p" rel="prev">Usage</a>, 
Up: <a href="index.html" accesskey="u" rel="up">grep</a> &nbsp; [<a 
href="index.html#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="Index.html" title="Index" 
rel="index">Index</a>]</p>
 </div>
 <hr>
-<span id="Performance-1"></span><h2 class="chapter">5 Performance</h2>
+<h2 class="chapter" id="Performance-1"><span>5 Performance<a 
class="copiable-link" href="#Performance-1"> &para;</a></span></h2>
 
-<span id="index-performance"></span>
-<p>Typically <code>grep</code> is an efficient way to search text.  However,
+<a class="index-entry-id" id="index-performance"></a>
+<p>Typically <code class="command">grep</code> is an efficient way to search 
text.  However,
 it can be quite slow in some cases, and it can search large files
 where even minor performance tweaking can help significantly.
-Although the algorithm used by <code>grep</code> is an implementation
+Although the algorithm used by <code class="command">grep</code> is an 
implementation
 detail that can change from release to release, understanding its
 basic strengths and weaknesses can help you improve its performance.
 </p>
-<p>The <code>grep</code> command operates partly via a set of automata that
+<p>The <code class="command">grep</code> command operates partly via a set of 
automata that
 are designed for efficiency, and partly via a slower matcher that
 takes over when the fast matchers run into unusual features like
 back-references.  When feasible, the Boyer&ndash;Moore fast string
 searching algorithm is used to match a single fixed pattern, and the
 Aho&ndash;Corasick algorithm is used to match multiple fixed patterns.
 </p>
-<span id="index-locales"></span>
-<p>Generally speaking <code>grep</code> operates more efficiently in
+<a class="index-entry-id" id="index-locales"></a>
+<p>Generally speaking <code class="command">grep</code> operates more 
efficiently in
 single-byte locales, since it can avoid the special processing needed
 for multi-byte characters.  If your patterns will work just as well
-that way, setting <code>LC_ALL</code> to a single-byte locale can help
-performance considerably.  Setting &lsquo;<samp>LC_ALL='C'</samp>&rsquo; can be
-particularly efficient, as <code>grep</code> is tuned for that locale.
-</p>
-<span id="index-case-insensitive-search-1"></span>
-<p>Outside the &lsquo;<samp>C</samp>&rsquo; locale, case-insensitive search, 
and search for
-bracket expressions like &lsquo;<samp>[a-z]</samp>&rsquo; and 
&lsquo;<samp>[[=a=]b]</samp>&rsquo;, can be
+that way, setting <code class="env">LC_ALL</code> to a single-byte locale can 
help
+performance considerably.  Setting &lsquo;<samp 
class="samp">LC_ALL='C'</samp>&rsquo; can be
+particularly efficient, as <code class="command">grep</code> is tuned for that 
locale.
+</p>
+<a class="index-entry-id" id="index-case-insensitive-search-1"></a>
+<p>Outside the &lsquo;<samp class="samp">C</samp>&rsquo; locale, 
case-insensitive search, and search for
+bracket expressions like &lsquo;<samp class="samp">[a-z]</samp>&rsquo; and 
&lsquo;<samp class="samp">[[=a=]b]</samp>&rsquo;, can be
 surprisingly inefficient due to difficulties in fast portable access to
 concepts like multi-character collating elements.
 </p>
-<span id="index-interval-expressions-1"></span>
+<a class="index-entry-id" id="index-interval-expressions-1"></a>
 <p>Interval expressions may be implemented internally via repetition.
-For example, &lsquo;<samp>^(a|bc){2,4}$</samp>&rsquo; might be implemented as
-&lsquo;<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>&rsquo;.  A large repetition 
count may
+For example, &lsquo;<samp class="samp">^(a|bc){2,4}$</samp>&rsquo; might be 
implemented as
+&lsquo;<samp class="samp">^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>&rsquo;.  A 
large repetition count may
 exhaust memory or greatly slow matching.  Even small counts can cause
-problems if cascaded; for example, &lsquo;<samp>grep -E
+problems if cascaded; for example, &lsquo;<samp class="samp">grep -E
 &quot;.*{10,}{10,}{10,}{10,}{10,}&quot;</samp>&rsquo; is likely to overflow a
 stack.  Fortunately, regular expressions like these are typically
 artificial, and cascaded repetitions do not conform to POSIX so cannot
 be used in portable programs anyway.
 </p>
-<span id="index-back_002dreferences"></span>
-<p>A back-reference such as &lsquo;<samp>\1</samp>&rsquo; can hurt performance 
significantly
+<a class="index-entry-id" id="index-back_002dreferences"></a>
+<p>A back-reference such as &lsquo;<samp class="samp">\1</samp>&rsquo; can 
hurt performance significantly
 in some cases, since back-references cannot in general be implemented
 via a finite state automaton, and instead trigger a backtracking
 algorithm that can be quite inefficient.  For example, although the
-pattern &lsquo;<samp>^(.*)\1{14}(.*)\2{13}$</samp>&rsquo; matches only lines 
whose
-lengths can be written as a sum <em class='math'>15x + 14y</em> for nonnegative
-integers <em class='math'>x</em> and <em class='math'>y</em>, the pattern 
matcher does not perform
+pattern &lsquo;<samp class="samp">^(.*)\1{14}(.*)\2{13}$</samp>&rsquo; matches 
only lines whose
+lengths can be written as a sum <em class="math">15x + 14y</em> for nonnegative
+integers <em class="math">x</em> and <em class="math">y</em>, the pattern 
matcher does not perform
 linear Diophantine analysis and instead backtracks through all
 possible matching strings, using an algorithm that is exponential in
 the worst case.
 </p>
-<span id="index-holes-in-files"></span>
+<a class="index-entry-id" id="index-holes-in-files"></a>
 <p>On some operating systems that support files with holes&mdash;large
 regions of zeros that are not physically present on secondary
-storage&mdash;<code>grep</code> can skip over the holes efficiently without
+storage&mdash;<code class="command">grep</code> can skip over the holes 
efficiently without
 needing to read the zeros.  This optimization is not available if the
-<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a 
href="File-and-Directory-Selection.html">File and Directory Selection</a>), 
unless the <samp>-z</samp> (<samp>--null-data</samp>)
-option is also used (see <a href="Other-Options.html">Other Options</a>).
+<samp class="option">-a</samp> (<samp 
class="option">--binary-files=text</samp>) option is used (see <a class="pxref" 
href="File-and-Directory-Selection.html">File and Directory Selection</a>), 
unless the <samp class="option">-z</samp> (<samp 
class="option">--null-data</samp>)
+option is also used (see <a class="pxref" href="Other-Options.html">Other 
Options</a>).
 </p>
-<span id="index-pipelines-and-reading"></span>
-<p>For efficiency <code>grep</code> does not always read all its input.
-For example, the shell command &lsquo;<samp>sed '/^...$/d' | grep -q 
X</samp>&rsquo; can
-cause <code>grep</code> to exit immediately after reading a line
-containing &lsquo;<samp>X</samp>&rsquo;, without bothering to read the rest of 
its input data.
-This in turn can cause <code>sed</code> to exit with a nonzero status because
-<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+<a class="index-entry-id" id="index-pipelines-and-reading"></a>
+<p>For efficiency <code class="command">grep</code> does not always read all 
its input.
+For example, the shell command &lsquo;<samp class="samp">sed '/^...$/d' | grep 
-q X</samp>&rsquo; can
+cause <code class="command">grep</code> to exit immediately after reading a 
line
+containing &lsquo;<samp class="samp">X</samp>&rsquo;, without bothering to 
read the rest of its input data.
+This in turn can cause <code class="command">sed</code> to exit with a nonzero 
status because
+<code class="command">sed</code> cannot write to its output pipe after <code 
class="command">grep</code> exits.
 </p>
-<p>For more about the algorithms used by <code>grep</code> and about
+<p>For more about the algorithms used by <code class="command">grep</code> and 
about
 related string matching algorithms, see:
 </p>
-<ul>
-<li> Aho AV. Algorithms for finding patterns in strings.
-In: van Leeuwen J. <em>Handbook of Theoretical Computer Science</em>, vol. A.
+<ul class="itemize mark-bullet">
+<li>Aho AV. Algorithms for finding patterns in strings.
+In: van Leeuwen J. <em class="emph">Handbook of Theoretical Computer 
Science</em>, vol. A.
 New York: Elsevier; 1990. p. 255&ndash;300.
 This surveys classic string matching algorithms, some of which are
-used by <code>grep</code>.
+used by <code class="command">grep</code>.
 
-</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to 
bibliographic search.
-<em>CACM</em>. 1975;18(6):333&ndash;40.
-<a 
href="https://doi.org/10.1145/360825.360855";>https://doi.org/10.1145/360825.360855</a>.
+</li><li>Aho AV, Corasick MJ. Efficient string matching: an aid to 
bibliographic search.
+<em class="emph">CACM</em>. 1975;18(6):333&ndash;40.
+<a class="url" 
href="https://doi.org/10.1145/360825.360855";>https://doi.org/10.1145/360825.360855</a>.
 This introduces the Aho&ndash;Corasick algorithm.
 
-</li><li> Boyer RS, Moore JS. A fast string searching algorithm.
-<em>CACM</em>. 1977;20(10):762&ndash;72.
-<a 
href="https://doi.org/10.1145/359842.359859";>https://doi.org/10.1145/359842.359859</a>.
+</li><li>Boyer RS, Moore JS. A fast string searching algorithm.
+<em class="emph">CACM</em>. 1977;20(10):762&ndash;72.
+<a class="url" 
href="https://doi.org/10.1145/359842.359859";>https://doi.org/10.1145/359842.359859</a>.
 This introduces the Boyer&ndash;Moore algorithm.
 
-</li><li> Faro S, Lecroq T. The exact online string matching problem: a review
+</li><li>Faro S, Lecroq T. The exact online string matching problem: a review
 of the most recent results.
-<em>ACM Comput Surv</em>. 2013;45(2):13.
-<a 
href="https://doi.org/10.1145/2431211.2431212";>https://doi.org/10.1145/2431211.2431212</a>.
+<em class="emph">ACM Comput Surv</em>. 2013;45(2):13.
+<a class="url" 
href="https://doi.org/10.1145/2431211.2431212";>https://doi.org/10.1145/2431211.2431212</a>.
 This surveys string matching algorithms that might help improve the
-performance of <code>grep</code> in the future.
+performance of <code class="command">grep</code> in the future.
 
-</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+</li><li>Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
 Exact string matching algorithms: survey issues, and future research 
directions.
-<em>IEEE Access</em>. 2019;7:69614&ndash;37.
-<a 
href="https://doi.org/10.1109/ACCESS.2019.2914071";>https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+<em class="emph">IEEE Access</em>. 2019;7:69614&ndash;37.
+<a class="url" 
href="https://doi.org/10.1109/ACCESS.2019.2914071";>https://doi.org/10.1109/ACCESS.2019.2914071</a>.
 This survey is more recent than Faro &amp; Lecroq,
 and focuses on taxonomy instead of performance.
 
-</li><li> Hume A, Sunday D. Fast string search.
-<em>Software Pract Exper</em>. 1991;21(11):1221&ndash;48.
-<a 
href="https://doi.org/10.1002/spe.4380211105";>https://doi.org/10.1002/spe.4380211105</a>.
+</li><li>Hume A, Sunday D. Fast string search.
+<em class="emph">Software Pract Exper</em>. 1991;21(11):1221&ndash;48.
+<a class="url" 
href="https://doi.org/10.1002/spe.4380211105";>https://doi.org/10.1002/spe.4380211105</a>.
 This excellent albeit now-dated survey aided the initial development
-of <code>grep</code>.
+of <code class="command">grep</code>.
 </li></ul>
 
 </div>
 <hr>
-<div class="header">
+<div class="nav-panel">
 <p>
 Next: <a href="Reporting-Bugs.html">Reporting bugs</a>, Previous: <a 
href="Usage.html">Usage</a>, Up: <a href="index.html">grep</a> &nbsp; [<a 
href="index.html#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="Index.html" title="Index" 
rel="index">Index</a>]</p>
 </div>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]