txr-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Txr-users] using txr for web scraping


From: Kai Carver
Subject: [Txr-users] using txr for web scraping
Date: Mon, 19 Oct 2009 14:49:36 +0200

Hi,

txr looks interesting!


I wonder whether it could be adapted for web scraping, if we
substitute "URL" for "text files".

  @(next SOURCE)

It might be as simple as allowing SOURCE to be a URL in next directives?

Ah, I suppose you could already do web scraping with shell commands?

  @(next "!wget -O - http://www.google.com/";)


By the way the documentation seems quite good (haven't quite read it all yet).

  http://www.nongnu.org/txr/txr-manpage.html

Small suggestions for improvement of the documentation:

- an example or two for Positive Match would be welcome.

If I understand correctly, the examples could be:

  pattern:      "text @{FOO /[0-9]/}"
  data:         "text 123 some more text"
  result:       FOO="123"

  pattern:      "phone: @{area 3} @{local 8}"
  data:         "phone: 617 867-5309 ext. 123"
  result:       area="617", phone="867-5309"

- the (until) example (before "The Flatten Directive") is twice the same.


One more question: is there "multi-line mode"? That is, can you match
a variable across several lines? Something like:

  pattern:      "bla bla. @{FOO /[^.]+/m}"
  data:         "bla bla. This is a sentence
on two lines. Bla bla"
  result:       FOO="This is a sentence
on two lines"

Or would I need to use a collect and concatenate to do that?


Kai Carver
Paris, France




reply via email to

[Prev in Thread] Current Thread [Next in Thread]