Phrasing GWC links

Post comments about Shareaza code and discuss with other developers.
Forum rules
Home | Wiki | Rules

Phrasing GWC links

Postby old_death » 06 Nov 2009 19:40

There are several things that should be done when a new GWC link is added to the list:

1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php
but:
http://abc.com:7893 >> http://abc.com:7893/

2. All "0"s preceding the elementary digits of port numbers should be removed.
http://abc.com:007893/ >> http://abc.com:7893/

3. Port "80" should be removed always.
http://cache.abc.com:80/ >> http://cache.abc.com

4. All URLs including ".nyud.net" should be deleted (nyud.net is an internet cache, which means, all IPs got from such a cache are outdated by days. Example: http://cache.trillinux.org.nyud.net:8090/g2/bazooka.php )

5. The transformed URL should be checked against the list and deleted if detected to be already there.

6. If there are 2 or more URLs differing only by their ending, they should be analyzed and only the shortest one should be kept:
Example: http://gwebcache.spforensic.com/ and http://gwebcache.spforensic.com/gwc.php and http://gwebcache.spforensic.com/index.php should become only http://gwebcache.spforensic.com .

7. Shareaza should detect somehow the type of cache. Lots of multi-net caches are not detected correctly. Most of them are only reported as being G2.

Also, on the release where this is implemented, all the URLs already in the cache should be checked against the same rules.

+ It would also be intelligent to be able to block entire domains. There could be a prefix called "Y" with the syntax: "Y domainname.domain" This would block all caches on a certain domain. Identically, this could also be done with IPs.
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Phrasing GWC links

Postby kevogod » 07 Nov 2009 01:09

old_death wrote:There are several things that should be done when a new GWC link is added to the list:

1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php

Stripping the last slash is not really a valid approach. Even though the specs say to do it, there is nothing invalid about http://www.example.com/test/.
Image
kevogod
 
Posts: 277
Joined: 13 Jun 2009 16:13

Re: Phrasing GWC links

Postby old_death » 07 Nov 2009 01:17

kevogod wrote:
old_death wrote:There are several things that should be done when a new GWC link is added to the list:

1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php

Stripping the last slash is not really a valid approach. Even though the specs say to do it, there is nothing invalid about http://www.example.com/test/.
In any case, there needs to be 1 single possible version for the URL. Either we add a slash for each of them, or we delete all the slashes. As deleting them looks nicer for most URLs (IMHO), that was what I proposed.
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Phrasing GWC links

Postby kevogod » 07 Nov 2009 01:30

old_death wrote:
kevogod wrote:
old_death wrote:There are several things that should be done when a new GWC link is added to the list:

1. All slashes behind the link should be removed if the last term is not a number. If it is a number, a slash should be added (compatibility with ports)
http://www.abc.com/cache.php/ >> http://www.abc.com/cache.php

Stripping the last slash is not really a valid approach. Even though the specs say to do it, there is nothing invalid about http://www.example.com/test/.
In any case, there needs to be 1 single possible version for the URL. Either we add a slash for each of them, or we delete all the slashes. As deleting them looks nicer for most URLs (IMHO), that was what I proposed.

Yes, but http://www.example.com/test/ sometimes works while http://www.example.com/test does not work (and vice versa).

You could use the domain and IP to identify a cache though.
Image
kevogod
 
Posts: 277
Joined: 13 Jun 2009 16:13

Re: Phrasing GWC links

Postby old_death » 07 Nov 2009 01:36

kevogod wrote:
old_death wrote:
kevogod wrote:Stripping the last slash is not really a valid approach. Even though the specs say to do it, there is nothing invalid about http://www.example.com/test/.
In any case, there needs to be 1 single possible version for the URL. Either we add a slash for each of them, or we delete all the slashes. As deleting them looks nicer for most URLs (IMHO), that was what I proposed.

Yes, but http://www.example.com/test/ sometimes works while http://www.example.com/test does not work (and vice versa).

You could use the domain and IP to identify a cache though.
This won't work as there are pages hosting more than one cache...
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Phrasing GWC links

Postby kevogod » 07 Nov 2009 21:10

old_death wrote:This won't work as there are pages hosting more than one cache...

My opinion is that only one GWebCache should be run from a single IP.

Also, what caches have duplicate IPs that are actually separate GWebCaches?
Image
kevogod
 
Posts: 277
Joined: 13 Jun 2009 16:13

Re: Phrasing GWC links

Postby wiggindesigns » 08 Nov 2009 06:43

I tested all of the working GWCs listed at http://gcachescan.jonatkins.com/

Code: Select all
72.236.167.156
gwc.nonexiste.net
1.gc.nonexiste.net
2.gc.nonexiste.net


Code: Select all
72.236.167.137
4.gc.nonexiste.net
3.gc.nonexiste.net
howl.gotdns.org


Code: Select all
208.67.219.132
leet.gtkg.net
sissy.gtkg.net



All of the others have separate IP addresses, or the same domains.
wiggindesigns
 
Posts: 46
Joined: 04 Aug 2009 04:17

Re: Phrasing GWC links

Postby kevogod » 08 Nov 2009 08:01

wiggindesigns wrote:I tested all of the working GWCs listed at http://gcachescan.jonatkins.com/

...

All of the others have separate IP addresses, or the same domains.

And all of those GWebCaches you listed are bad caches except for leet.gtkg.net and sissy.gtkg.net.
Image
kevogod
 
Posts: 277
Joined: 13 Jun 2009 16:13

Re: Phrasing GWC links

Postby wiggindesigns » 08 Nov 2009 19:24

I didn't bother to open shareaza to see which ones were blacklisted. And whether or not they are bad, I was providing an example of how many share the same IP(think i should stop posting here, especially reading here recently the "devs" seem to have huge egos and jump on people for insignificant things, or discredit any ideas without discussion)..

Either way, I was pointing out that out of all of the GWCs, very few share the same IP.. I don't really think the feature is needed to check whether each one has the same IP as it seems just wasted resources(however little). Though I do agree with the idea of cleaning up the other urls so that it doesnt put both http://gwc.example.com and http://gwc.example.com/ on the list. Not a really urgent thing at all, but it would be nice if it could be implemented somehow. (Im a clean freak, duplicates kinda bug me in lists)
wiggindesigns
 
Posts: 46
Joined: 04 Aug 2009 04:17

Re: Phrasing GWC links

Postby kevogod » 08 Nov 2009 19:44

wiggindesigns wrote:I didn't bother to open shareaza to see which ones were blacklisted. And whether or not they are bad, I was providing an example of how many share the same IP(think i should stop posting here, especially reading here recently the "devs" seem to have huge egos and jump on people for insignificant things, or discredit any ideas without discussion)..

Either way, I was pointing out that out of all of the GWCs, very few share the same IP.. I don't really think the feature is needed to check whether each one has the same IP as it seems just wasted resources(however little). Though I do agree with the idea of cleaning up the other urls so that it doesnt put both http://gwc.example.com and http://gwc.example.com/ on the list. Not a really urgent thing at all, but it would be nice if it could be implemented somehow. (Im a clean freak, duplicates kinda bug me in lists)

If you check the IPs of caches, you will not get duplicates like http://gwc.example.com/ and http://gwc.example.com since they both share the same IP.

Although checking the full domain is probably sufficient in checking for duplicates.
Image
kevogod
 
Posts: 277
Joined: 13 Jun 2009 16:13

Re: Phrasing GWC links

Postby wiggindesigns » 08 Nov 2009 20:04

Thats what I was about to write, but the fact that there are some caches that share IPs though are technically different caches.. If you check the IP, it would be nice to check if the existing cache is working and if not replace with the different url. Though the only ones that have more than one cache on one ip are blacklisted IPs anyways, so that wouldnt really need to be implemented.
wiggindesigns
 
Posts: 46
Joined: 04 Aug 2009 04:17

Re: Phrasing GWC links

Postby ale5000 » 10 Apr 2015 11:32

One thing that I would like to say to the ones that host GWCs: Do NOT call it index.php, because this will create duplicate URLs (there are also other problems but this one is the most diffused and it is just stupid because it can be easily avoided); there is really no advantage in calling it index.php
Well, also the developers of GWCs can avoid this by changing the name of the main page.
ale5000
 
Posts: 66
Joined: 18 Nov 2012 22:56


Return to Development Discussion

Who is online

Users browsing this forum: No registered users and 1 guest