emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised


From: Maxim Nikulin
Subject: Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
Date: Mon, 15 Mar 2021 18:54:18 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1

On 13/03/2021 12:24, Ihor Radchenko wrote:
Here is the right one

I think, some unit tests would be great to avoid future regressions. Since it is heuristics, some cases will be always broken, so tests could clarify intentions.

I have not tested the regexp on real cases, so you have full rights to be skeptical concerning my comments. I prefer to have explicit URLs in my files.

+         (let ((non-space-bracket "[^][ \t\n()<>]+"))

I think, "+" is redundant here, it does not allow empty inner parenthesis and leads to nested construction "(x+)+" that could be simplified to "x+"

+            ;; Heiristics for an URL link.  Source:

Typo: Heuristics

+            ;; 
https://daringfireball.net/2010/07/improved_regex_for_matching_urls
+            (rx-to-string
+             `(seq (regexp "\\<")

Isn't it an equivalent to "word-start"?

+                   (regexp ,types-re)
+                   ":"
+                   (1+ (or (regex ,non-space-bracket)
+                           (seq "("
+                                (* (or (regex ,non-space-bracket)
+                                       (seq "("
+                                            (regex ,non-space-bracket)

Maybe "0+" to allow "(())"?

+                                            ")")))
+                                ")")))
+                   (or (seq "("
+                            (* (or (regex ,non-space-bracket)
+                                   (seq "("
+                                        (regex ,non-space-bracket)
+                                        ")")))
+                            ")")
+                       (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))

Is the group useful for any purpose? The construction is already inside an "or".

I have tried to compare current, your, and a little modified your regexps on several synthetic examples:

#+begin_src elisp
  (let*
      ((types-re (regexp-opt (org-link-types) t))
       (org-link-plain-re-ir
        (let ((non-space-bracket "[^][ \t\n()<>]+"))
          ;; Heiristics for an URL link.  Source:
          ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
          (rx-to-string
           `(seq (regexp "\\<")
                 (regexp ,types-re)
                 ":"
                 (1+ (or (regex ,non-space-bracket)
                         (seq "("
                              (* (or (regex ,non-space-bracket)
                                     (seq "("
                                          (regex ,non-space-bracket)
                                          ")")))
                              ")")))
                 (or (seq "("
                          (* (or (regex ,non-space-bracket)
                                 (seq "("
                                      (regex ,non-space-bracket)
                                      ")")))
                          ")")
                     (regexp "\\([^[:punct:] \t\n]\\|/\\)"))))))
       (org-link-plain-re-nm
        (let* ((non-space-bracket "[^][ \t\n()<>]")
               (parenthesis
                `(seq "("
                      (0+ (or (regex ,non-space-bracket)
                              (seq "("
                                   (0+ (regex ,non-space-bracket))
                                   ")")))
                      ")")))
          ;; Heuristics for an URL link inspired by
          ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
          (rx-to-string
           `(seq word-start
                 (regexp ,types-re)
                 ":"
                 (1+ (or (regex ,non-space-bracket)
                         ,parenthesis))
                 (or (regexp "[^[:punct:] \t\n]")
                     ?/
                     ,parenthesis))))))
       (cons
        (list "src" "orig" "ir" "nm")
        (mapcar (lambda (s)
                  (cons
                   s
                   (mapcar (lambda (r)
                             (if (string-match r s) (match-string 0 s) ""))
                           (list org-link-plain-re org-link-plain-re-ir 
org-link-plain-re-nm))))
                (list
                 "The file:a link"
                 "The file:aa link"
                 "The file:a(b)c link"
                 "The file:a() link"
                 "The file:aa((a)) link"
                 "The file:aa(()) link"
                 "The file:///a link"
                 "The file:///a/, link"
                 "The http:// link"
                 "The (some file:ab) link"
                 "The file:aa) link"
                 "The file:aa( link"
))))
#+end_src

#+RESULTS:
| src                     | orig      | ir           | nm           |
| The file:a link         |           |              |              |
| The file:aa link        | file:aa   | file:aa      | file:aa      |
| The file:a(b)c link     | file:a(b) | file:a(b)c   | file:a(b)c   |
| The file:a() link       |           | file:a()     | file:a()     |
| The file:aa((a)) link   | file:aa   | file:aa((a)) | file:aa((a)) |
| The file:aa(()) link    | file:aa   | file:aa      | file:aa(())  |
| The file:/a link        | file:/a   | file:/a      | file:/a      |
| The file:/a/, link      | file:/a/  | file:/a/     | file:/a/     |
| The http:// link        | http://   | http://      | http://      |
| The (some file:ab) link | file:ab   | file:ab      | file:ab      |
| The file:aa) link       | file:aa   | file:aa      | file:aa      |
| The file:aa( link       | file:aa   | file:aa      | file:aa      |




reply via email to

[Prev in Thread] Current Thread [Next in Thread]