I am trying to begin writing a script for generating book index by reading two inputs:
1, The complete book in PDF or DOC format.
2. A list of words/phrases in Ascii txt/Excel to look for in the book.
Using the word list from 2, the program needs to search the whole document 1 and return a text file that enlists all the words with the page numbers separated by comma. Perhaps we can make it read the page numbers by retrieving information from the footer or using a loop to count the pages.
I would be grateful if someone can share with me such a script if available. Or some related scripts/ideas to implement this would also be very helpful.
Thanks and regards