[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fixing url-unhex-string for unicode/multi-byte charsets
From: |
Eli Zaretskii |
Subject: |
Re: fixing url-unhex-string for unicode/multi-byte charsets |
Date: |
Fri, 06 Nov 2020 15:34:01 +0200 |
> Date: Fri, 6 Nov 2020 07:28:46 -0500
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: emacs-devel@gnu.org
>
> > A stand-alone test case, which doesn't require an actual trash, would
> > be appreciated, so I could see which parrt doesn't work, and how to
> > fix it.
>
> That would be the two file names that I previously posted. You say that
> they succeeded for you, but they didn't for me. The result I got was
> good for the first case (English two words), and garbage for the second
> case (Hebrew two words).
I tried that before posting the suggestion. FTR, the below works for
me on the current emacs-27 branch and on master, both on MS-Windows
(where I used a literal 'utf-8 instead of file-name-coding-system)
and on GNU/Linux:
(dolist (str '("hello%20world"
"%d7%a9%d7%9c%d7%95%d7%9d%20%d7%a2%d7%95%d7%9c%d7%9d"))
(insert (decode-coding-string (url-unhex-string str)
(or file-name-coding-system
default-file-name-coding-system))
"\n"))
The result of evaluating this is two lines inserted into the current
buffer:
hello world
שלום עולם
If this doesn't work for you, or if you tried something slightly
different, I'd like to hear the details, perhaps there's some
subtlety I'm missing.
> > Alternatively, maybe you could explain why you needed to insert the
> > text into a temporary buffer and then extract it from there? AFAIK,
> > we have the same primitives that work on decoding strings as we have
> > for decoding buffer text.
>
> I don't need to. It's implementation done in emacs-w3m. I also pointed
> out that eww does it differently. I think the need in emacs-w3m is to
> mix the ascii characters and selected binary output, which can't be done
> with say replace-regexp-in-string. So what they do is use a temporary
> buffer, set `buffer-multibyte' to nil, and instead of
> replace-regexp-in-string build the result in the temporary buffer.
As a rule of thumb, any Lisp code that needs to do something with a
string and does that by inserting it into a temporary buffer and
working on that instead, should raise the "missing primitive" alarm.
In this case, I see no missing primitives for decoding a string, so
using a temp buffer looks an unnecessary complication to me.
- fixing url-unhex-string for unicode/multi-byte charsets, Boruch Baum, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Eli Zaretskii, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Boruch Baum, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Eli Zaretskii, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Boruch Baum, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets,
Eli Zaretskii <=
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Stefan Monnier, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Eli Zaretskii, 2020/11/06
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Boruch Baum, 2020/11/08
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Stefan Monnier, 2020/11/08
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Eli Zaretskii, 2020/11/08
- Re: fixing url-unhex-string for unicode/multi-byte charsets, Stefan Monnier, 2020/11/06