[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Librefm-discuss] Re: lastscrape.py
From: |
Gordon Haverland |
Subject: |
Re: [Librefm-discuss] Re: lastscrape.py |
Date: |
Tue, 2 Feb 2010 12:28:15 -0700 |
User-agent: |
KMail/1.12.4 (Linux/2.6.26; KDE/4.3.4; i686; ; ) |
On February 1, 2010, Matt Lee wrote:
> On 02/01/2010 11:29 PM, Seth Woodworth wrote:
> > I would suggest, when possible, using the Html5lib parser and
> > using the traverser from BeautifulSoup. The author himself
> > suggests[1] this in any case of BS-3.1.0 or 3.0.8 behaving
> > poorly.
> >
> > I have been doing work with python, BeautifulSoup and
> > Html5Lib lately, and I've been collecting and slowly
> > improving python scripts (like this) to liberate data from
> > websites like Reddit or the Ubuntu forums. I would love to
> > get involved with the lastscrape.py script.
>
> http://bugs.libre.fm/wiki/LastToLibre is the new way to do
> this.
>
> Last.fm has an API now, for people like us ;)
Well, it took about 6 hours to download, probably half a dozen
restarts needed. I decided to overlap the pages (if it failed at
page N, I restarted at page N-1). In the 6 hours, the total
number of pages went up by 1 (to 6905 pages). So, I guess I am
going to have to clean this up a little. (Not today.)
Do you require me to upload this to libre.fm in pieces, or can it
be just one big file?
Gord