[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another issue with thingatpt
From: |
Piet van Oostrum |
Subject: |
Re: Another issue with thingatpt |
Date: |
Fri, 29 Dec 2006 22:23:55 +0100 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.92 (darwin) |
>>>>> Bob Rogers <address@hidden> (BR) wrote:
>BR> From: Werner LEMBERG <address@hidden>
>BR> Date: Wed, 27 Dec 2006 11:50:42 +0100 (CET)
>BR> Here's another problematic URL:
>BR> http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find?components=&U+20207;
>BR> thingatpt ignores the final `;'.
>BR> Werner
>BR> According to RFC3986 (aka STD066), this is wrong; ";" is legitimate
>BR> anywhere in a path or query part, including the end. So are "." and
>BR> ",", but thing-at-point-url-path-regexp also refuses to match these
>BR> characters at the end of the string. Doing (ffap-string-at-point 'url)
>BR> drops these characters plus ":", "!", and (questionably) "?".
>BR> It may not be possible to find a tradeoff between RFC compliance and
>BR> parsing dwimmery that would satisfy everybody. Since stripping off
>BR> trailing punctuation is useful behavior (ISTR it's worked this way for a
>BR> while now), I would recommend against changing it now. However, a case
>BR> could be made for making thing-at-point and ffap-string-at-point
>BR> consistent. Perhaps "!:;.," would be best? This is just the union of
>BR> the two sets but without the dubious inclusion of "?".
The way to reconcile these would be to customize it, I think. For example
have a string variable that contains the punctuation characters to be
included at the end. Or a regexp.
By the way, thing-at-point-url-path-regexp also disallows : inside a url.
These would be necessary to accept IPv6 IP addresses.
--
Piet van Oostrum <address@hidden>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: address@hidden