[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/doc-toc 7e50cc20d0 53/84: Implement extract-only
From: |
ELPA Syncer |
Subject: |
[elpa] externals/doc-toc 7e50cc20d0 53/84: Implement extract-only |
Date: |
Mon, 26 Sep 2022 13:58:38 -0400 (EDT) |
branch: externals/doc-toc
commit 7e50cc20d0a395ef29991c7a6023ade5336c00af
Author: Daniel Nicolai <dalanicolai@gmail.com>
Commit: Daniel Nicolai <dalanicolai@gmail.com>
Implement extract-only
---
toc-mode.el | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/toc-mode.el b/toc-mode.el
index d2dbfa279a..ccc61bd6fb 100644
--- a/toc-mode.el
+++ b/toc-mode.el
@@ -393,6 +393,40 @@ unprocessed text."
(setq-local doc-buffer source-buffer)
(insert text)))
+;;;###autoload
+(defun toc-extract-only ()
+ "Just extract text via OCR without further processing.
+Prompt for startpage and endpage and print OCR output to new buffer."
+ (interactive)
+ (let ((mode (derived-mode-p 'pdf-view-mode 'djvu-read-mode)))
+ (if mode
+ (let* ((page (string-to-number
+ (read-string "Enter start-pagenumber for extraction: ")))
+ (endpage (string-to-number
+ (read-string "Enter end-pagenumber for extraction:
")))
+ (ext (url-file-extension (buffer-file-name (current-buffer))))
+ (buffer (file-name-sans-extension (buffer-name)))
+ (args (list "stdout" "--psm" "6")))
+ (when toc-ocr-languages
+ (setq args (append args (list "-l" toc-ocr-languages))))
+ (while (<= page (+ endpage))
+ (let ((file (cond ((string= ".pdf" ext)
+ (make-temp-file "pageimage"
+ nil
+ (number-to-string page)
+ (pdf-cache-get-image page 600)))
+ ((string= ".djvu" ext)
+ (djvu-goto-page page)
+ (make-temp-file "pageimage"
+ nil
+ (number-to-string page)
+ (image-property djvu-doc-image
:data))))))
+ (apply 'call-process
+ (append (list "tesseract" nil (list buffer nil) nil file)
+ args))
+ (setq page (1+ page))))
+ (switch-to-buffer buffer)))))
+
(defun toc--create-tablist-buffer ()
"Create tablist buffer, from cleaned up Table of Contents buffer, for easy
page number adjustment."
(interactive)
- [elpa] externals/doc-toc 4c49867186 27/84: Fix TOC-tabular mode name and add page-scroll keybinding, (continued)
- [elpa] externals/doc-toc 4c49867186 27/84: Fix TOC-tabular mode name and add page-scroll keybinding, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 31611d6e9f 30/84: Update LICENSE, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 8f14a2928e 31/84: Prepare package for publishing on MELPA, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a510f530cb 33/84: Merge branch 'master' of github.com:dalanicolai/toc-mode, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 6eafd64b9e 38/84: Add # to tell compiler about function reference (for MELPA), ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc c39306dad7 41/84: Fix docstrings with checkdoc (for MELPA), ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc e52b2d8ee3 44/84: Improve some (add-to-document) docstrings, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 7a436b0ca3 46/84: Remove info about limited level deepness for djvu, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 1602e353b0 49/84: Improve parsing djvu outline (prin1-to-string), ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a11b2caa88 51/84: Add paypal donate link, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 7e50cc20d0 53/84: Implement extract-only,
ELPA Syncer <=
- [elpa] externals/doc-toc 10b254ef2c 58/84: Fix auto pagenumber contents OCR, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 8c2cb1283d 60/84: Add info about pdf.tocgen alternative, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc dc5be6442f 66/84: Add jump-by-level to toc-tabular-mode, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 06de33f511 70/84: Fix pdf-view-current-page macro not function, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a0018b8a63 71/84: Add replace fields in toc-tabular-mode feature, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc f2ea7db226 68/84: Fix all checkdoc warnings, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 487177a81b 73/84: Update package description, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 36fe728f38 72/84: Make tesseract psm code configurable via universal arg, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc d5629c7165 75/84: Update djvu ocr to djvu3.el (create ppm instead of svg), ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a259f4cc65 77/84: Correct docstring position, ELPA Syncer, 2022/09/26