[Txr-users] using txr for web scraping

txr-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Txr-users] using txr for web scraping

From:	Kai Carver
Subject:	[Txr-users] using txr for web scraping
Date:	Mon, 19 Oct 2009 14:49:36 +0200

Hi,

txr looks interesting!


I wonder whether it could be adapted for web scraping, if we
substitute "URL" for "text files".

  @(next SOURCE)

It might be as simple as allowing SOURCE to be a URL in next directives?

Ah, I suppose you could already do web scraping with shell commands?

  @(next "!wget -O - http://www.google.com/";)


By the way the documentation seems quite good (haven't quite read it all yet).

  http://www.nongnu.org/txr/txr-manpage.html

Small suggestions for improvement of the documentation:

- an example or two for Positive Match would be welcome.

If I understand correctly, the examples could be:

  pattern:      "text @{FOO /[0-9]/}"
  data:         "text 123 some more text"
  result:       FOO="123"

  pattern:      "phone: @{area 3} @{local 8}"
  data:         "phone: 617 867-5309 ext. 123"
  result:       area="617", phone="867-5309"

- the (until) example (before "The Flatten Directive") is twice the same.


One more question: is there "multi-line mode"? That is, can you match
a variable across several lines? Something like:

  pattern:      "bla bla. @{FOO /[^.]+/m}"
  data:         "bla bla. This is a sentence
on two lines. Bla bla"
  result:       FOO="This is a sentence
on two lines"

Or would I need to use a collect and concatenate to do that?


Kai Carver
Paris, France

[Prev in Thread]

Current Thread

[Next in Thread]

[Txr-users] using txr for web scraping, Kai Carver <=
- Re: [Txr-users] using txr for web scraping, Kaz Kylheku, 2009/10/19
  - Re: [Txr-users] using txr for web scraping, Kai Carver, 2009/10/20

Next by Date: Re: [Txr-users] using txr for web scraping
Next by thread: Re: [Txr-users] using txr for web scraping
Index(es):
- Date
- Thread