Page 1 of 1

A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 07:53
by outcrop

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 10:51
by ocexyz
Thank you Outcorp! Great work!

Instructions

PostPosted: 19 Jun 2009 12:15
by outcrop
1)Download php:
http://www.php.net/get/php-5.2.10-Win32 ... m/a/mirror

2)Unzip it to a directory like c:\
then make a new directory named "cache"

3)copy this spider code and save as yahoospider.php,
and download htmlsql.class.php from http://www.jonasjohn.de/lab/htmlsql.htm

put them in the same directory like c:\php\spider

4)run it.
click the start menu-run-cmd.exe,then goto the php directory,just type:
php.exe c:\php\spider\yahoospider.php
there will be errors or something else in the console, just ingnore it.
it will save all the got files to the cache directory by name


5)New query
When the spider ended, you can edit the query string:
$querystr = "wiki+site:pantheraproject.net";

to other search like:
$querystr = "developers+inurl:wiki+site:pantheraproject.net";

just change $querystr to any fomate like you search in the search engine.

then save the spider and goto 4.

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 14:04
by borat1
Thnx Outcrop !
Sadly I have got this error :
Parse error: syntax error, unexpected T_STRING in C:\PhP\spider\yahoospider.php on line 130

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 16:01
by outcrop

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 16:09
by old_death
I get the following error:
Fatal error: Call to undefined function curl_init() in C:\php\spider\yahoospider.php on line 142

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 16:44
by borat1
Almost the same error as OD.
Fatal error: Call to undefined function curl_init() in C:\PhP\spider\yahoocachespider.php on line 157

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 16:53
by ocexyz
try to ignore, this could be effect of what now is hanging on pantproj.net

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 17:28
by outcrop

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 19 Jun 2009 17:33
by outcrop

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 20 Jun 2009 07:14
by borat1
Since I have a fixed ip, is there anyway to use this with TOR ?

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 20 Jun 2009 21:57
by old_death

;]

PostPosted: 21 Jun 2009 00:48
by aaron_walkhouse
Pick one.

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 21 Jun 2009 06:16
by borat1
My Computer -> properties ->Advanced -> Environment Variables
Edit system variable -> Path -> add c:\php;

create cache dir -> c:\php\cache

copy php.ini-recommended -> php.ini

Edit php.ini :

Line 528, change to :
include_path = ".;c:\php\includes"

Line 542, change to :
extension_dir = "c:\php\ext"

Line 630, change to :
auto_detect_line_endings = ON

Line 751 + 752 change to :
;SMTP = localhost
;smtp_port = 25

Line 661 - 705 to :
extension=php_bz2.dll
extension=php_curl.dll
;extension=php_dba.dll
;extension=php_dbase.dll
;extension=php_exif.dll
extension=php_fdf.dll
extension=php_gd2.dll
extension=php_gettext.dll
extension=php_gmp.dll
;extension=php_ifx.dll
;extension=php_imap.dll
;extension=php_interbase.dll
extension=php_ldap.dll
extension=php_mbstring.dll
;extension=php_mcrypt.dll
;extension=php_mhash.dll
;extension=php_mime_magic.dll
extension=php_ming.dll
extension=php_msql.dll
extension=php_mssql.dll
extension=php_mysql.dll
extension=php_mysqli.dll
;extension=php_oci8.dll
extension=php_openssl.dll
extension=php_pdo.dll
extension=php_pdo_firebird.dll
extension=php_pdo_mssql.dll
extension=php_pdo_mysql.dll
;extension=php_pdo_oci.dll
;extension=php_pdo_oci8.dll
extension=php_pdo_odbc.dll
extension=php_pdo_pgsql.dll
extension=php_pdo_sqlite.dll
extension=php_pgsql.dll
extension=php_pspell.dll
extension=php_shmop.dll
;extension=php_snmp.dll
extension=php_soap.dll
extension=php_sockets.dll
extension=php_sqlite.dll
;extension=php_sybase_ct.dll
extension=php_tidy.dll
extension=php_xmlrpc.dll
extension=php_xsl.dll
extension=php_zip.dll

(Maybe I made too many extensions active then really needed, but it seems to work allright...)

Please bare in mind :
I am a noob and was a bit drunk when I got home @6 in the morning after spending some time with
friends all night long, when I had a "bright" idea why it did not work here on my PC...
So feel free to add any comments or even better an improved php.ini people can use as a template !!

Almost forgot :
When blacklisted it seems to take at least 30 minutes before you can resume again.
So enjoy your breaks. :D

Current status :
A hangover and more then 2000 files in my cache and counting...

Re: ;]

PostPosted: 21 Jun 2009 13:52
by old_death

Re: A simple PHP Yahoo cached page spider for recovering the web

PostPosted: 21 Jun 2009 14:11
by borat1
"can not open ./cache/Translate+-+Shareaza+Wiki.html"
Looks like it can not find your cache dir, where did you put it ?
And is it correctly spelled ?