[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Encoding help
From: |
B. T. Raven |
Subject: |
Encoding help |
Date: |
Mon, 01 Jun 2009 11:51:13 -0500 |
User-agent: |
Thunderbird 2.0.0.21 (Windows/20090302) |
I have a file created by saving a pdf as text and I want to convert the
whole thing to utf-8 encoding. If I force the encoding for save in Emacs
23.0 to utf-8 I get the following in a *Warning* buffer:
These default coding systems were tried to encode text
in the buffer `span.txt':
(utf-8-dos (122 . 4194285) (165 . 4194257) (204 . 4194285) (253
. 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
. 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
(iso-latin-1-dos (122 . 4194285) (165 . 4194257) (204 . 4194285)
(253 . 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
. 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
However, each of them encountered characters it couldn't encode:
[Below are many dozens of \xxx octal escape sequences]
utf-8-dos cannot encode these: ...
iso-latin-1-dos cannot encode these: ...
The original pdf shows many standard diacritics for Romance languages
along with a few vowels with macrons. There is no option in Adobe Reader
for saving as encoded text. If my only option is to Search and Replace
these escape sequences with Unicode characters, how can I get a list of
all these bad characters (they all show in red in Emacs 23 anyway). Has
any of you written routines to replace things like these using a list of
dotted pairs or something similar?
Thanks,
Ed
- Encoding help,
B. T. Raven <=