[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/doc-toc 3c7efe78d2 61/84: Implement pdf.tocgen function
From: |
ELPA Syncer |
Subject: |
[elpa] externals/doc-toc 3c7efe78d2 61/84: Implement pdf.tocgen functionality |
Date: |
Mon, 26 Sep 2022 13:58:38 -0400 (EDT) |
branch: externals/doc-toc
commit 3c7efe78d220118878662b69033b24da63f1895c
Author: Daniel Nicolai <dalanicolai@gmail.com>
Commit: Daniel Nicolai <dalanicolai@gmail.com>
Implement pdf.tocgen functionality
---
README.org | 23 ++++++++++++++------
toc-mode.el | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----------
2 files changed, 76 insertions(+), 17 deletions(-)
diff --git a/README.org b/README.org
index 39962063e0..280544cbff 100644
--- a/README.org
+++ b/README.org
@@ -23,18 +23,29 @@ For Spacemacs use the
[[https://github.com/dalanicolai/toc-layer][toc-layer for
For regular Emacs users, well... you probably know how to install packages.
** Requirements
-Currently the package requires the ~pdftotext~ (part of poppler-utils),
~pdfoutline~
-(part of
[[https://launchpad.net/ubuntu/bionic/+package/fntsample][fntsample]]) and
~djvused~ (part of
[[http://djvu.sourceforge.net/][http://djvu.sourceforge.net/]]) command
-line utilities to be available. Extraction with OCR requires the ~tesseract~
command
+To use the pdf.tocgen functionality that software has to be installed (see
+[[https://krasjet.com/voice/pdf.tocgen/]]). For the other remaining
functionality
+the package requires ~pdftotext~ (part of poppler-utils), ~pdfoutline~ (part of
+[[https://launchpad.net/ubuntu/bionic/+package/fntsample][fntsample]]) and
~djvused~ (part of
[[http://djvu.sourceforge.net/][http://djvu.sourceforge.net/]]) command line
+utilities to be available. Extraction with OCR requires the ~tesseract~ command
line utility to be available.
* Usage
-For 'software-generated' (i.e. PDF's not created from scans) PDF-files I
-recommend to use [[https://krasjet.com/voice/pdf.tocgen/][pdf.tocgen]] instead
of =toc-mode= (although the pdf.tocgen
-functionality might get integrated into =toc-mode= soon).
+For 'software-generated' (i.e. PDF's not created from scans) PDF-files it is
+recommend to use =toc-extract-with-pdf-tocgen=. To use this function you first
+have to provide the font properties for the different headline levels. For that
+select the word in a headline of a certain level and then type =M-x
+toc-gen-set-level=. This function will ask which level you are setting, the
+highest level should be level 1. After you have set the various levels (1,2,
+etc.) then it is time to run =M-x toc-extract-with-pdf-tocgen=. If a TOC is
+extracted succesfully, then in the pdftocgen-mode buffer simply press C-c C-c
to
+add the contents to the PDF. The contents will be added to a copy of the
+original PDF with the filename output.pdf and this copy will be opened in a new
+buffer. If the pdf-tocgen option does not work well then continue with the
steps
+below.
In each step below, check out available shortcuts using =C-h m=. Additionally
you
can find available functions by typing the M-x mode-name (e.g. =M-x
toc-cleanup=),
diff --git a/toc-mode.el b/toc-mode.el
index 6aee05f762..c99c38287a 100644
--- a/toc-mode.el
+++ b/toc-mode.el
@@ -27,13 +27,17 @@
;; toc-mode.el is a package to create and add a Table of Contents to pdf and
;; djvu documents. It implements features to extract a Table of Contents from
;; the textlayer of a document or via OCR if that last option is necessary or
-;; prefered. Subsequently this package implements various features to assist in
-;; tidy up the extracted Table of Contents, adjust the pagenumbers and finally
-;; parsing the Table of Contents into syntax that is understood by the
-;; `pdfoutline' and `djvused' commands that are used to add the table of
-;; contents to pdf- and djvu-files respectively.
-
-;; Requirements: Currently the package requires the `pdftotext' (part of
+;; prefered. For 'software generated' PDFs it provides the option to use
+;; pdf.tocgen (see URL `https://krasjet.com/voice/pdf.tocgen/'). Subsequently
+;; this package implements various features to assist in tidy up the extracted
+;; Table of Contents, adjust the pagenumbers and finally parsing the Table of
+;; Contents into syntax that is understood by the `pdfoutline' and `djvused'
+;; commands that are used to add the table of contents to pdf- and djvu-files
+;; respectively.
+
+;; Requirements: To use the pdf.tocgen functionality that software has to be
+;; installed (see URL `https://krasjet.com/voice/pdf.tocgen/'). For the other
+;; remaining functionality the package requires the `pdftotext' (part of
;; poppler-utils), `pdfoutline' (part of fntsample) and `djvused' (part of
;; http://djvu.sourceforge.net/) command line utilities to be available.
;; Extraction with OCR requires the tesseract command line utility to be
@@ -41,10 +45,18 @@
;; Usage:
-;; For 'software-generated' (i.e. PDF's not created from scans) PDF-files I
-;; recommend to use pdf.tocgen (see URL
`https://krasjet.com/voice/pdf.tocgen/')
-;; instead of `toc-mode' (although the pdf.tocgen functionality might get
-;; integrated into =toc-mode= soon).
+;; For 'software-generated' (i.e. PDF's not created from scans) PDF-files it is
+;; recommend to use `toc-extract-with-pdf-tocgen'. To use this function you
+;; first have to provide the font properties for the different headline levels.
+;; For that select the word in a headline of a certain level and then type M-x
+;; `toc-gen-set-level'. This function will ask which level you are setting, the
+;; highest level should be level 1. After you have set the various levels (1,2,
+;; etc.) then it is time to run M-x `toc-extract-with-pdf-tocgen'. If a TOC is
+;; extracted succesfully, then in the pdftocgen-mode buffer simply press C-c
C-c
+;; to add the contents to the PDF. The contents will be added to a copy of the
+;; original PDF with the filename output.pdf and this copy will be opened in a
+;; new buffer. If the pdf-tocgen option does not work well then continue with
+;; the steps below.
;; In each step below, check out available shortcuts using C-h m. Additionally
;; you can find available functions by typing the M-x mode-name (e.g. M-x
@@ -238,6 +250,42 @@ String (i.e. surround with double quotes)."
:type 'file
:group 'toc)
+;;;; pdf.tocgen
+(defun toc-gen-set-level (level)
+ (interactive "nWhich level you are setting (number): ")
+ (shell-command (format "pdfxmeta --auto %s --page %s '%s' \"%s\" >>
recipe.toml"
+ level
+ (pdf-view-current-page)
+ (url-filename (url-generic-parse-url
buffer-file-name))
+ (car (pdf-view-active-region-text)))))
+
+(defun toc-extract-with-pdf-tocgen ()
+ (interactive)
+ (let ((filename buffer-file-name)
+ (toc (shell-command-to-string
+ (format "pdftocgen '%s' < recipe.toml" buffer-file-name))))
+ (switch-to-buffer "toc")
+ (toc-pdftocgen-mode) ;; required before setting local variable
+ (when (fboundp 'flyspell-mode)
+ flyspell-mode)
+ (setq-local pdf-filename filename)
+ (insert toc)))
+
+(defun toc--pdftocgen-add-to-pdf ()
+ (interactive)
+ (write-file default-directory)
+ (shell-command (format "pdftocio -o output.pdf '%s' < toc" pdf-filename))
+ (find-file "output.pdf"))
+
+(defvar toc-pdftocgen-mode-map
+ (let ((map (make-sparse-keymap)))
+ (define-key map "\C-c\C-c" #'toc--pdftocgen-add-to-pdf)
+ map))
+
+(define-derived-mode toc-pdftocgen-mode
+ fundamental-mode "TOC-cleanup"
+ "Major mode for cleaning up Table Of Contents
+\\{toc-pdftocgen-mode-map}")
;;;; toc-extract and cleanup
- [elpa] externals/doc-toc 34d16c6c06 07/84: implement pdf navigation from TOC-tabular-mode, (continued)
- [elpa] externals/doc-toc 34d16c6c06 07/84: implement pdf navigation from TOC-tabular-mode, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 56218310ba 08/84: fix funcitons and update docs for djvu, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc b869d9c888 14/84: Update README with tesseract ocr info, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc f7aacf7e77 25/84: Set encoding and cleanup linefeeds, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc d1cc2e950c 26/84: UPDATE README add extract screencast, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc b22f6f0a2d 28/84: add golden-ratio-mode to toc-tablist, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc caa066b67d 29/84: Add spell check and correct start pagenumber, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 81f58d0871 35/84: Fix double-dash function names, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 36b7f88d6a 39/84: Declare more functions (for MELPA), ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 7208f4a987 40/84: Add featurep checks for publish on MELPA, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 3c7efe78d2 61/84: Implement pdf.tocgen functionality,
ELPA Syncer <=
- [elpa] externals/doc-toc c86a72abda 12/84: Implement replace original or ad toc to copy of pdf, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 0d5ef18820 01/84: Initial commit, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 22174d1a0f 10/84: Update README, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc d0cbefb88b 15/84: Update (OCR) requirements with tesseract, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 4ecb3825b9 16/84: Update README, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 5314cd0b2a 17/84: Update README, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc e6e20d2d65 20/84: Implement shift all remaining numbers and follow/preview, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc a341a6eddc 21/84: Add autoload magic comments and file-header, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 4c49867186 27/84: Fix TOC-tabular mode name and add page-scroll keybinding, ELPA Syncer, 2022/09/26
- [elpa] externals/doc-toc 31611d6e9f 30/84: Update LICENSE, ELPA Syncer, 2022/09/26