Phrasing GWC links
Posted: 06 Nov 2009 19:40
There are several things that should be done when a new GWC link is added to the list:
1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php
but:
http://abc.com:7893 >> http://abc.com:7893/
2. All "0"s preceding the elementary digits of port numbers should be removed.
http://abc.com:007893/ >> http://abc.com:7893/
3. Port "80" should be removed always.
http://cache.abc.com:80/ >> http://cache.abc.com
4. All URLs including ".nyud.net" should be deleted (nyud.net is an internet cache, which means, all IPs got from such a cache are outdated by days. Example: http://cache.trillinux.org.nyud.net:8090/g2/bazooka.php )
5. The transformed URL should be checked against the list and deleted if detected to be already there.
6. If there are 2 or more URLs differing only by their ending, they should be analyzed and only the shortest one should be kept:
Example: http://gwebcache.spforensic.com/ and http://gwebcache.spforensic.com/gwc.php and http://gwebcache.spforensic.com/index.php should become only http://gwebcache.spforensic.com .
7. Shareaza should detect somehow the type of cache. Lots of multi-net caches are not detected correctly. Most of them are only reported as being G2.
Also, on the release where this is implemented, all the URLs already in the cache should be checked against the same rules.
+ It would also be intelligent to be able to block entire domains. There could be a prefix called "Y" with the syntax: "Y domainname.domain" This would block all caches on a certain domain. Identically, this could also be done with IPs.
1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php
but:
http://abc.com:7893 >> http://abc.com:7893/
2. All "0"s preceding the elementary digits of port numbers should be removed.
http://abc.com:007893/ >> http://abc.com:7893/
3. Port "80" should be removed always.
http://cache.abc.com:80/ >> http://cache.abc.com
4. All URLs including ".nyud.net" should be deleted (nyud.net is an internet cache, which means, all IPs got from such a cache are outdated by days. Example: http://cache.trillinux.org.nyud.net:8090/g2/bazooka.php )
5. The transformed URL should be checked against the list and deleted if detected to be already there.
6. If there are 2 or more URLs differing only by their ending, they should be analyzed and only the shortest one should be kept:
Example: http://gwebcache.spforensic.com/ and http://gwebcache.spforensic.com/gwc.php and http://gwebcache.spforensic.com/index.php should become only http://gwebcache.spforensic.com .
7. Shareaza should detect somehow the type of cache. Lots of multi-net caches are not detected correctly. Most of them are only reported as being G2.
Also, on the release where this is implemented, all the URLs already in the cache should be checked against the same rules.
+ It would also be intelligent to be able to block entire domains. There could be a prefix called "Y" with the syntax: "Y domainname.domain" This would block all caches on a certain domain. Identically, this could also be done with IPs.