emacs-elpa-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[elpa] externals/doc-toc b45b78102c 55/84: Update README, add extract-on


From: ELPA Syncer
Subject: [elpa] externals/doc-toc b45b78102c 55/84: Update README, add extract-only documentation
Date: Mon, 26 Sep 2022 13:58:38 -0400 (EDT)

branch: externals/doc-toc
commit b45b78102c285b0b0f2d38b74a16ada2b9c9bb23
Author: Daniel Nicolai <dalanicolai@gmail.com>
Commit: Daniel Nicolai <dalanicolai@gmail.com>

    Update README, add extract-only documentation
---
 README.org  | 9 ++++++++-
 toc-mode.el | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/README.org b/README.org
index cb31fac9ea..6ee13df2b4 100644
--- a/README.org
+++ b/README.org
@@ -52,7 +52,14 @@ data). Also the languages used for tesseract OCR can be 
customized via the
 
 A buffer with the, somewhat cleaned up, extracted text will open in TOC-cleanup
 mode. Prefix command with the universal argument (=C-u=) to omit clean and get 
the
-raw text.
+raw text. If the extracted text is of too low quality you either can 
hack/extend
+the [[help:toc-extract-pages-ocr][toc-extract-pages-ocr]] definition, or 
alternatively you can try to extract
+the text with the 
[[https://pypi.org/project/document-contents-extractor/][python 
document-contents-extractor script]], which is more
+configurable (you are also welcome to hack on and improve that script).
+For this the 
[[https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html][tesseract]] 
documentation might be useful.
+
+If you merely want to extract text without further processing then you can
+use the command [[help:toc-extract-only][toc-extract-only]].
 
 ** 2. TOC-Cleanup
 In this mode you can further cleanup the contents to create a list where
diff --git a/toc-mode.el b/toc-mode.el
index 4d6f2f19c1..d2075a210a 100644
--- a/toc-mode.el
+++ b/toc-mode.el
@@ -63,6 +63,9 @@
 ;; `https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html' might be
 ;; useful.
 
+;; If you merely want to extract text without further processing then you can
+;; use the command `toc-extract-only'.
+
 ;; 2. TOC-Cleanup In this mode you can further cleanup the contents to create a
 ;; list where each line has the structure:
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]