Yahoo cached pages(about 2400 pages) in a mess

Discuss any general topics regarding Shareaza.
Forum rules
Home | Wiki | Rules

Yahoo cached pages(about 2400 pages) in a mess

Postby outcrop » 19 Jun 2009 18:03

outcrop
 
Posts: 15
Joined: 16 Jun 2009 09:50

Re: Yahoo cached pages(about 2400 pages) in a mess

Postby ce3c » 19 Jun 2009 19:48

Using Warrick and a Google crawler: http://ce3c.be/raza/
* pantheraproject.tgz (warrick, indexed files)
* rzcache.sql.gz (crawler, sql db)

Around 3000 wiki pages in total were grabbed, probably w/ some doubles,
it fetched both http and https which was needless.

Time to scrape content?
ce3c
 
Posts: 17
Joined: 13 Jun 2009 13:44

Re: Yahoo cached pages(about 2400 pages) in a mess

Postby outcrop » 19 Jun 2009 20:08

outcrop
 
Posts: 15
Joined: 16 Jun 2009 09:50

Re: Yahoo cached pages(about 2400 pages) in a mess

Postby kathw » 25 Jul 2009 23:18

Bump
User avatar
kathw
 
Posts: 96
Joined: 13 Jun 2009 13:57

Re: Yahoo cached pages(about 2400 pages) in a mess

Postby ocexyz » 26 Jul 2009 22:28

User avatar
ocexyz
 
Posts: 624
Joined: 15 Jun 2009 13:09


Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest