[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/doc-toc b45b78102c 55/84: Update README, add extract-on
From: |
ELPA Syncer |
Subject: |
[elpa] externals/doc-toc b45b78102c 55/84: Update README, add extract-only documentation |
Date: |
Mon, 26 Sep 2022 13:58:38 -0400 (EDT) |
branch: externals/doc-toc
commit b45b78102c285b0b0f2d38b74a16ada2b9c9bb23
Author: Daniel Nicolai <dalanicolai@gmail.com>
Commit: Daniel Nicolai <dalanicolai@gmail.com>
Update README, add extract-only documentation
---
README.org | 9 ++++++++-
toc-mode.el | 3 +++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/README.org b/README.org
index cb31fac9ea..6ee13df2b4 100644
--- a/README.org
+++ b/README.org
@@ -52,7 +52,14 @@ data). Also the languages used for tesseract OCR can be
customized via the
A buffer with the, somewhat cleaned up, extracted text will open in TOC-cleanup
mode. Prefix command with the universal argument (=C-u=) to omit clean and get
the
-raw text.
+raw text. If the extracted text is of too low quality you either can
hack/extend
+the [[help:toc-extract-pages-ocr][toc-extract-pages-ocr]] definition, or
alternatively you can try to extract
+the text with the
[[https://pypi.org/project/document-contents-extractor/][python
document-contents-extractor script]], which is more
+configurable (you are also welcome to hack on and improve that script).
+For this the
[[https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html][tesseract]]
documentation might be useful.
+
+If you merely want to extract text without further processing then you can
+use the command [[help:toc-extract-only][toc-extract-only]].
** 2. TOC-Cleanup
In this mode you can further cleanup the contents to create a list where
diff --git a/toc-mode.el b/toc-mode.el
index 4d6f2f19c1..d2075a210a 100644
--- a/toc-mode.el
+++ b/toc-mode.el
@@ -63,6 +63,9 @@
;; `https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html' might be
;; useful.
+;; If you merely want to extract text without further processing then you can
+;; use the command `toc-extract-only'.
+
;; 2. TOC-Cleanup In this mode you can further cleanup the contents to create a
;; list where each line has the structure:
- [elpa] externals/doc-toc f643745b06 32/84: Set windows encoding for djvu, (continued)
- [elpa] externals/doc-toc f643745b06 32/84: Set windows encoding for djvu, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc c521029525 36/84: Change (beginning-og-buffer) to (goto-char (point-min)) for MELPA, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a1d26eceb5 37/84: Add defvar and declare-function to avoid warning MELPA, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 05cd1287f1 42/84: Fix toc--add-to-djvu. Don't ask save location., ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc f430243a88 43/84: Add version: 0 header, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 59f4471e6a 50/84: Update README.org, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 2d95c466a3 48/84: Add MELPA and GPL3 badges, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 734043bdc7 47/84: Improve documentation in toc-mode.el, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc ae455b4863 52/84: Implement language customization for OCR, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 23e1fb2fde 54/84: Implement HandyOutliner option, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc b45b78102c 55/84: Update README, add extract-only documentation,
ELPA Syncer <=
- [elpa] externals/doc-toc b1a843fd6f 57/84: Implement roman-to-arabic and add pdf djvu keybindings, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 719f6a03a7 64/84: Return page text when pdfxmeta fails, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc dd1dfd83ac 63/84: Fix docstrings and warnings for MELPA, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 7e2e6be947 69/84: Update/improve README, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 782d0cd6b5 80/84: Update README.org, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 977bec00d8 74/84: Tiny bug fix in toc--tablist-to-handyoutliner, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 448a0ac00c 82/84: Small fixes before release on ELPA (fix compiler warnings), ELPA Syncer, 2022/09/26