demexp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Demexp-dev] What to check in an HTTP URL?


From: David MENTRE
Subject: Re: [Demexp-dev] What to check in an HTTP URL?
Date: Fri, 23 Sep 2005 16:45:19 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux)

Hello Félix and FX,

François-Xavier Ponscarme <address@hidden> writes:

>   http://www.foad.org/~abigail/Perl/url2.html

Well, it is rather complicated to do an extensive check. I looked
quickly at BigBrother code but it uses Pcre with camlp4 extensions so I
don't want to dig in that code for now.

Right now, I do following checks:

 - link field size is limited to 256 bytes;

 - the link should match following (OCaml Str[1]) regexp:
    ^http://[-A-Za-z0-9_.]+\\(:[0-9]+\\)?[-A-Za-z0-9+&:;@_.%=?/]*$

This regexp limit ourself to a pretty basic character set and HTTP. I
prefer to be too much restrictive, and loosing the check afterwards if
needed.

Let me know if you see potential issues in this check.

Yours,
d.

Footnotes: 
[1]  Like in Emacs, '(' and ')' are doubly escaped.
-- 
pub  1024D/A3AD7A2A 2004-10-03 David MENTRE <address@hidden>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A





reply via email to

[Prev in Thread] Current Thread [Next in Thread]