[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Txr-users] using txr for web scraping
From: |
Kai Carver |
Subject: |
[Txr-users] using txr for web scraping |
Date: |
Mon, 19 Oct 2009 14:49:36 +0200 |
Hi,
txr looks interesting!
I wonder whether it could be adapted for web scraping, if we
substitute "URL" for "text files".
@(next SOURCE)
It might be as simple as allowing SOURCE to be a URL in next directives?
Ah, I suppose you could already do web scraping with shell commands?
@(next "!wget -O - http://www.google.com/")
By the way the documentation seems quite good (haven't quite read it all yet).
http://www.nongnu.org/txr/txr-manpage.html
Small suggestions for improvement of the documentation:
- an example or two for Positive Match would be welcome.
If I understand correctly, the examples could be:
pattern: "text @{FOO /[0-9]/}"
data: "text 123 some more text"
result: FOO="123"
pattern: "phone: @{area 3} @{local 8}"
data: "phone: 617 867-5309 ext. 123"
result: area="617", phone="867-5309"
- the (until) example (before "The Flatten Directive") is twice the same.
One more question: is there "multi-line mode"? That is, can you match
a variable across several lines? Something like:
pattern: "bla bla. @{FOO /[^.]+/m}"
data: "bla bla. This is a sentence
on two lines. Bla bla"
result: FOO="This is a sentence
on two lines"
Or would I need to use a collect and concatenate to do that?
Kai Carver
Paris, France
- [Txr-users] using txr for web scraping,
Kai Carver <=