High number of Hub2Hub connections

Post comments about Shareaza code and discuss with other developers.
Forum rules
Home | Wiki | Rules

High number of Hub2Hub connections

Postby old_death » 29 Oct 2010 14:28

High number of Hub2Hub connections are generally to be considered bad for the network, as they mean that if someone inserts a popular search query in a such a big Hub cluster, his client may be hammered to death.
However, there are always some users who (most probably because they don't know better) set their client to keep a high number of Hub2Hub connections. I think we should solve this problem by decreasing our hardcoded limit from 64 to 8 or 10, as more Hub2Hub connections are simply not needed and not helpful to the network.
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 29 Oct 2010 15:42

Alternately we could have Shareaza remember high neighbour count hubs and query hubs in the order of their known number of neighbours. And have Shareaza query such high neighbour hubs even when we know it has no leaves that match already. Which would minimize the number of neighbours we might query and prevent flooding. Would make searches faster too.

So a search would start by first querying all known high neighbour hubs (H2H > 8) and then proceed normally.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 29 Oct 2010 17:09

...which would result in even bigger GUI lags at the beginning of searching? And which would most certainly waste a lot of network resources when doing searches for popular files/keywords?

Also we would give spammers a nice tool to make their spam even more efficient, as their Hubs would be the first to be asked for people knowing about a file (as they would most certainly be the first ones to exploit this "feature").

And finally, do not forget that there is only a limited number of these high Hub2Hub count Hubs on the network, which will lead in overloading them (remember, their load is already high as they have that much peers AND those people are most probably not those with the fastest PCs on the entire network).


Therefore I think that simply reducing the limit might be the best/most effective thing we can do - and most certainly it is more easy that to rewrite parts of the searching code anyway (because we have to change only a single line to change the hardcoded limit).
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 29 Oct 2010 23:27

My suggestion would guarantee that each such hub is queried only once instead of potentially getting multiple routed queries from its high number of neighbours. So it would probably reduce the load on such hubs. (Well, it would still get queried by neighbours that also have high H2H counts, but we can add a rule that stops hubs with more than eight neighbours from connection to other hubs with more than eight neighbours so that is manageable.)

It is true that spammers would be certain to use this. But we are stuck with spam anyway... And block lists work just fine against bad hubs.

And obviously your suggestion is much simpler and easier, no argument there. I simply suggested an alternative method of getting the same result that would allow such "super hubs" to still be an option. I doubt I am the only person who has done some speculation on tweaking the G2 topology by making the hubs more heterogeneous? Having two separate types of hubs one with leaves and few neighbours as currently and another with lots of neighbours would, if supported by the search code, allow making search faster. (Larger cluster size -> fewer clusters to iterate -> faster search) So it is something that should be considered before dropping the maximum limit.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 30 Oct 2010 13:13

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 30 Oct 2010 13:34

I'd rather leave that to the hypothetical person willing to try making it work.

More on topic, if we had two different types of hubs, we might as well have two separate settings for them. So I guess my objection to your suggestion isn't really valid after all...
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby ailurophobe » 30 Oct 2010 18:14

Did some more thinking... Posting the result here in case others might be interested.

Something like having 4 possible modes.

Leaf
Forced leaf mode with two or three hub connections.

Hub
Forced hub mode with six hub-to-hub connections and a variable number of leaves.

Auto
Leaf mode that auto promotes to Hub mode if necessary and requirements are met.

High
Forced hub mode with variable (but high by default) number of hub-to-hub connections and no leaves. Will not connect to other leaveless hubs.

Additionally High neighbours will be reported as Horizon not Neighbours. This will route queries to High when possible in a backwards compatible way, while being leafless will make them to be queried only once in a backwards compatible way. (Unless the query hits something being shared... Needs to be user controlled for now.) Eventually some sort of auto-promote to High is needed, but that can and must wait until most peers have been updated.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 31 Oct 2010 13:37

This might be something good to have, however, I think that this might also be harmful, as this increases the danger of clients on slow computers/with slow bandwidth to be hammered down when they do a search query for a popular keyword on the network...

Also, by reducing the number of Hubs that would be "in charge" of all searches (as that's what your super hubs would be), it is more easy to do big damage by using specially modified bad clients (for example, an anti P2P company could create such a client that does not forward searches and ask every employee to run such a client at home on their ever IP changing internet connection... the result would be disastrous as many nodes wouldn't be reachable for searches any more). Currently, this is more difficult as the network has too many Hubs to make such an attempt fruitful if you don't have some hundred PCs with a modified software running...
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 01 Nov 2010 15:41

My suggestion would have no effect on the number of normal hubs. Only the number of clusters would be reduced by increasing the number of interconnections. This should improve performance without reducing redundancy. Every leaf would still be linked to more than one hub. And every hub would still share its qht with just as many neighbours. Blocking search requires that both hubs are only linked to "bad" neighbours. And the difficulty of doing that would be unchanged since it only depends on the number of hubs and the number of neighbours those hubs have, which would not change.

But maybe there should be logic for checking that your hubs don't have too much overlap? Drop any hub that doesn't have at least 80% neighbours no other connected hub has? That would improve network topology and make searches more robust. No protocol changes or extensions required either.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 01 Nov 2010 22:08

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 03 Nov 2010 01:05

He actually did get a PM thru to me. But thanks for posting it here since I didn't feel like answering with a PM or reposting his PM here. (He didn't actually explain the problem in his PM, but seems I guessed right what it was.)

But this is a pre-existing problem. While my suggestion would make the symptoms worse (although we could and probably should ramp the neighbour count up in stages with first release having twenty to forty neighbours and a later release ramping that up to somewhere between one and two hundred), it would also make fixing this easier. There is nothing we can do about a bot accessing lots of hubs in a short time, but throttling how many neighbours a super-hub will forward one query is trivial. And it won't even hurt search accuracy (or backwards compatibility) if we randomize which neighbours it is forwarded to and do not claim we searched the hubs we chose not to forward to so that they will still be searched if necessary.

So, a good catch, but super hubs cut both ways; make it both easier to abuse such flaws and to mitigate them.

btw, I am not opposed to a protocol update, we are certainly overdue to have one, I just don't think this suggestion requires one.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby brov » 06 Nov 2010 16:53

How do you prevent forwarding a single query to, say, 100 other hubs in short time? If you find a solution to this problem then this proposal is good. Throttling at receiving hub is not enough, a single query can produce many hits...
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: High number of Hub2Hub connections

Postby ailurophobe » 06 Nov 2010 20:42

By setting a limit on maximum forwards per query and stopping going thru the QHTs when that limit is reached and only returning the neighbours you checked in QA/D and ones you skipped in QA/S. If you set the limit at 6 this would pose the exact same DDOS threat as the current network.

But honestly I thought query keys were supposed to prevent DDOS use anyway?

EDIT:
But I guess that wanting the neighbours to never check high hubs and always returning them in QA/S does qualify as a protocol change?
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby brov » 06 Nov 2010 23:32

Yes, query keys are designed to prevent ddos.

However, nothing is verified when forwarding a query to hubs in cluster. That could be used to do ddos attack. Just be a malicious hub who connect to say 100 others and send one query per sec to every connected hub... Now that malicious hub can fake return address, and all replies are sent to the victim.
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: High number of Hub2Hub connections

Postby ailurophobe » 08 Nov 2010 07:01

Right, that seems bad. But it is not really related to my suggestion as a bad hub would not obey any hub-to-hub connection limits anyway, but would simply connect to as many hubs as possible as fast as possible without bothering to report correct neighbour counts or to route real queries correctly. Nothing we can do to help or hurt without a protocol change on that.

The easiest "fix" would probably be to stop forwarding queries to neighbours altogether and simply return QA/D for hubs without QHT hits and QA/S for everyone else. But there is a reason I put the "fix" in quotation marks... It might work if we also did a major change to network topology. Like increasing neighbour counts...

And making QHTs 32 bit, maybe? The extra 12 bits should reduce false hits and current way of encoding and processing QHTs is pretty inefficient for hubs. I mean why deal with a 128kB array and gzip when the actual information contained (uncompressed in any way) would with 2% full be 20 bits per entry (index value in an array of bits) times 10 000 for a total of 200 000 bits or 26 kB and an utterly trivial compression of sending huffman coded offsets from last value sent would halve that. Then you could just use a multimap to store hash key / source (either library entries, hubs, or neighbours) pairs and a map to store hash key / reference count pairs. It would be roughly as compact on network as current gzip compressed system, lookups would be much faster (especially on higher leaf/neighbour counts), and the processing overhead would be trivial. And while the memory footprint would with 2% full 20 bit QHT be roughly the same as now, it would be much lower for nearly empty QHTs (most leaves) and not be affected at all by moving to use a 32 bit QHT.

The above is something I have been thinking for a while. Haven't bothered to mention it or do anything about it since I haven't figured a way to do it sensibly without breaking backwards compatibility on hubs. Mentioned it now since we were discussing protocol "fixes" and somebody here might be smarter than I am. Not like I am any kind of a G2 expert...
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 08 Nov 2010 13:18

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 10 Nov 2010 10:50

Adding a flag to the handshake or even to the qht packets is not the problem. The problem is that older clients can't process 32 bit qhts so you'd have to accept, send, and even check two different types of qhts. So for example, to a 20 bit neighbour you'd have to send a gzipped 20 bit qht that combines all 20 bit qhts you got from leaves and converted versions of 32 bit qhts you have. And to 32 bit neighbours you'd have send two qhts one combining 32 bit qhts and another for 20 bit qhts. I suppose it is doable, but it is hardly elegant and sending the gzipped 20 bit qht to one neighbour would waste almost as much resources as sending it to six does. (gzip has decent compression, it just does more processing than is necessary for data that has a well defined structure known to us.)
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 11 Nov 2010 16:55

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 12 Nov 2010 02:18

Maybe the easiest way would be to actually have them as entirely separate networks that just happen to work the same? Shareaza already has support for using multiple networks and that would remove the chance of network problems due to bugs in the compatibility code. And more importantly the need to write such code.

Every leaf of the new Shareaza version would then connect to two G2R hubs and one G2 hub and auto-promote to hub of either type as needed. G2 hubs would have G2 neighbors and G2R hubs G2R neighbors. The way G2 works should make this practical. The next version would then no longer connect to the legacy G2 by default. With the intervals between versions we have that should be practical.

Obviously, we'd need to make the new network a major upgrade to users for this to make sense. Supporting push-to-push connections should be on list. And we'd need ryo to be on board, so the update would have to be something he sees as a major improvement as well.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby raspopov » 13 Nov 2010 20:06

Shareaza can handle any QHT size and can dynamically increase its size if needed.
User avatar
raspopov
Project Admin
 
Posts: 945
Joined: 13 Jun 2009 12:30

Re: High number of Hub2Hub connections

Postby old_death » 13 Nov 2010 20:31

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby ailurophobe » 14 Nov 2010 05:09

It is not actually defined anywhere AFAIK. As ryo said you can use any value, 20 is just the default. The issue is that the method used to store and transmit qht currently and in the previous versions doesn't actually scale well to higher values. For example sending a 32 bit qht would require having gzip compress a 256 MB buffer. It would compress really well and transmit just as well as a 20 bit qht (because it usually would have the same amount of actual information apart from few hubs with several high file count leaves), but the receiver would have to process 256 MB's of data. I haven't actually tested this, but I expect that in practice requesting qht's above 24 bits would seriously kill the performance of Shareaza versions not specifically coded to support large qht's. Would be happy to be wrong about that though.

EDIT: I am only talking about 32 bit qht's because the changes I want would let us do that precision "for free", there is no actual need to go that high really. 20 bits is bit low for leaves with lots of files and hub qht's seem to be bit too full as well, but I think 22 bits would be perfectly sufficient to fix that. And the current system probably could handle that...

EDIT 2: 256 MB's above should be 512 MB's. (4 Gbits.) I actually had it right first but got confused and "fixed" it wrong.
Last edited by ailurophobe on 15 Nov 2010 19:59, edited 1 time in total.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: High number of Hub2Hub connections

Postby old_death » 14 Nov 2010 19:26

Well, if the standard does allow for such big qhts, then there should be no problem: We can implement an additional way of treating them and make the leaves send the big qhts only to those Hubs that are known for supporting them. Or am I mistaken?
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: High number of Hub2Hub connections

Postby brov » 14 Nov 2010 19:32

You can send such table to any G2 node that supports QHT scaling (in practice, all nodes)
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15


Return to Development Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron