Page 1 of 1

Yahoo cached pages(about 2400 pages) in a mess

PostPosted: 19 Jun 2009 18:03
by outcrop

Re: Yahoo cached pages(about 2400 pages) in a mess

PostPosted: 19 Jun 2009 19:48
by ce3c
Using Warrick and a Google crawler: http://ce3c.be/raza/
* pantheraproject.tgz (warrick, indexed files)
* rzcache.sql.gz (crawler, sql db)

Around 3000 wiki pages in total were grabbed, probably w/ some doubles,
it fetched both http and https which was needless.

Time to scrape content?

Re: Yahoo cached pages(about 2400 pages) in a mess

PostPosted: 19 Jun 2009 20:08
by outcrop

Re: Yahoo cached pages(about 2400 pages) in a mess

PostPosted: 25 Jul 2009 23:18
by kathw
Bump

Re: Yahoo cached pages(about 2400 pages) in a mess

PostPosted: 26 Jul 2009 22:28
by ocexyz