Page 1 of 1

Use RTT information when routing queries

PostPosted: 07 Nov 2010 19:02
by old_death

Re: Use RTT information when routing queries

PostPosted: 08 Nov 2010 07:24
by ailurophobe
Only problem I see with this is that if picking a random hub causes a "bad distribution of routed packets" then that hub should not be a hub at all (or at least have a lower leaf count). So you are treating the symptom instead of curing the disease. Not that there is anything wrong with treating the symptom, but it would be better to make the hub aware of how it is handling the load and drop the leaf count or even hub mode entirely if the answer is "badly." But we've discussed that before, so I'll stop boring you with it.

Re: Use RTT information when routing queries

PostPosted: 08 Nov 2010 13:01
by old_death

Re: Use RTT information when routing queries

PostPosted: 08 Nov 2010 19:05
by brov
Which hub will be better? That one who is forwarding query hits to 90% of its leaves or the one who is forwarding hits to only one? (Assuming equal hardware and other conditions). This approach is going to equally distribute firewalled and open leaves between hubs, doing some simple indirect load balancing. Most load on hubs (excluding udp queries) is caused by forwarding hits to firewalled nodes.

Re: Use RTT information when routing queries

PostPosted: 10 Nov 2010 10:36
by ailurophobe
The idea of the dynamic leaf count proposals is that the hub adds leaves until it starts affecting its performance negatively. This means that all hubs would have equal load relative to their performance. Faster machines would have more leaves and slower less. Since the relative load is what is being measured and it is "equalized", hubs with more push forwards will automatically compensate by having fewer leaves. So the balancing would happen automatically as a side effect. An RTT comparison for two such hubs would be useless since all hubs would equally capable anyway.

The point about being able to help leaves and hubs that haven't been updated is a very good one, though.

Re: Use RTT information when routing queries

PostPosted: 11 Nov 2010 17:05
by old_death
How about just doing both? We could perform a dynamic leaf count calculation during the quick start wizard run at first start and use that as a default value for the number of supported leaves if a computer switches to Hub mode. And via the RTT calculation, fine tuning while operating could be done.

I think we shouldn't under estimate the need for control over what happens user do have: By automatically determining how many leaves a Hub should have, users cannot control that setting manually - and some, especially the filesharing freaks (and those are who we cannot afford to loose on G2) like me for example do want this control. The more of it, the better.

Re: Use RTT information when routing queries

PostPosted: 12 Nov 2010 02:31
by ailurophobe
Doing both would probably be best. RTT helps indirectly balance (to use brovs term) even older versions, dynamic leaf count doesn't.

Incidentally, the "dynamic" means that the leaf count is continuously adjusted in run time according to the current performance. Or rather: instead of there being a maximum leaf count, there is a minimum response time limit. (And no more leaves are accepted if that limit is not met.) This minimum limit would be an extended setting people can change.

Re: Use RTT information when routing queries

PostPosted: 13 Nov 2010 08:33
by old_death
Maybe the dynamic number of supported leaves of a Hub should be determined once a week rather than once a minute, by using the worst score during that week as a basis for calculation. Like that, Hubs won't drop leaf connections as often as they would if they were adapting their number of leaves all the time. Remember, this would save the network a great deal of traffic, as each time a leaf has to connect to a new Hub, a big amount of data needs to be transmitted. So, the less Hub switching, the better for the overall network...

Also, like this we would probably have less of those "bad user experiences" when someone actually notices a huge impact in performance once Shareaza switches to Hub mode (which is something that should absolutely be prevented).

Re: Use RTT information when routing queries

PostPosted: 13 Nov 2010 19:12
by ailurophobe
The way I see it every hub would simply keep track of its current performance by measuring the time from query key request to the query answer and when a leaf asks for a hub connection it would check if any of the last N such exchanges was faster than a time limit and if not it would deny the connection. You'd probably also want to add a throttle to how fast the hub adds leaves. At most ten per minute or something like that.

Re: Use RTT information when routing queries

PostPosted: 13 Nov 2010 19:59
by raspopov
RTT useless. For example Intel i7 8Gb RAM computer with 1Gb/s bandwidth Internet connection but... connected via satellite provider has RTT above 300 ms or even greater.

Re: Use RTT information when routing queries

PostPosted: 13 Nov 2010 20:43
by old_death

Re: Use RTT information when routing queries

PostPosted: 13 Nov 2010 21:08
by brov

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 05:41
by ailurophobe
Latency actually matters for routing push requests, so using RTT would actually give a pretty good result as long as that fast but high latency system had a higher leaf (or neighbour) count than the lower latency hubs used to route the push requests do. So RTT would give good result even for that corner case if either the dynamic leaf count proposal or the super-hub proposal was also implemented. Or just add a "high latency connection" check box somewhere that makes the hub mode to have a higher than normal leaf count but drop firewalled leaves.

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 07:21
by raspopov

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 14:10
by brov
What fragmentation if RTT information is used by firewalled leaves only?

And yes, I know there are many satellite providers, but, honestly I never saw a hub connected via satellite provider...

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 14:24
by raspopov

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 14:54
by brov
But not firewalled ones, right?

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 14:57
by raspopov

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 15:00
by brov
You are simply wrong here, what about TCP ping, isn't it possible?

Re: Use RTT information when routing queries

PostPosted: 14 Nov 2010 19:14
by old_death

Re: Use RTT information when routing queries

PostPosted: 15 Nov 2010 00:56
by smokex
Satellite providers limit monthly thoughtput at a pretty low level. Like on Wildblue the best account maxed out at like 50Gb a month. That limit can be reached quickly and after that, additional use charges stack up to a high price pretty quickly. Therefore, users of satellite service are not big users of P2P. It is in the "acceptable use clause" or "fair use clause" of most satellite ISP contracts.

Re: Use RTT information when routing queries

PostPosted: 08 Dec 2010 07:34
by kaffeemonster
Forget the Satellites for a second, Ras was exagerating.

In essence he is right.
If you go strictly my the RTT, you lump the Network together on "short Network distances".
Example: This means that maybe South Afrika will be lumped together with Europe (because they have fast fiber up here), but Europe would be "split" from Asia (or South America, or, or).

RTT is only good to filter the most badly lagging Hubs. Drop connecitons with an RTT of say over 2s.
But do _not_ favour the lowest RTT.

Other things to improve:
1) Go to your policital representant he should bang some clue into the telecoms to do FTTH, NOW, so we all get 10 Mbit up
2) Improve Shareaza overload detection?

Greetings
Jan

Re: Use RTT information when routing queries

PostPosted: 08 Dec 2010 16:09
by old_death
That's not true: The problem is that TTL differences due to physical distances are usually smaller than 10-100ms, while differences due to packet processing etc. on the Shareaza part are usually bigger than 100ms (TTL times > 10s are can be observed at a regular basis). Until this changes, the problem of the network being split into regional subnets is not very high.

And even if it this would lead to a regional regrouping of the network, it wouldn't be bad, as we are talking only about firewalled leaves here, not about any other network member. This means that only firewalled leaves would be using Hubs with a shorter physical distance, not anyone else...

Re: Use RTT information when routing queries

PostPosted: 09 Dec 2010 00:08
by kaffeemonster
Hmmm.

Asia:
64 bytes from orange.kame.net (203.178.141.194): icmp_req=1 ttl=52 time=330 ms
64 bytes from rev198.asus.com (211.72.249.198): icmp_req=1 ttl=238 time=339 ms

USA:
64 bytes from web2.eff.org (64.147.188.3): icmp_req=1 ttl=52 time=218 ms
64 bytes from forward.markmonitor.com (64.124.14.63): icmp_req=1 ttl=117 time=212 ms /* was time-warner.com */

Around the corner:
64 bytes from www.heise.de (193.99.144.85): icmp_req=1 ttl=249 time=51.1 ms
64 bytes from www.free.fr (212.27.48.10): icmp_req=1 ttl=121 time=70.1 ms

If Shareaza really needs more than 100ms to process a packet, then IMHO there is something really going wrong.
I do not have the numbers for G2CD ATM, but i think it was around 1ms...
And that was "measured" with tcpdump, so from packet entering the network card, to answer leaving the network card, but with uncongested Network!

I guess the problem is that most of the time the upload is totally congested. And that needs to takled.

Maybe by mesuring the RTT and saying: "Sorry, you are a to crowded/congested/bad Hub - bye".
But not by activly seeking the Hub with the lowest RTT.
Note the fine difference in this!

Nearby, how to you prevent the effect of a stupid swarm. All Clients mesaure a good RTT on one Hub, all now use that hub to do their relaying, because of that the Hub gets congested. All Client move on, only to congest the next Hub.
You enter a world of problems there.

So again:
Filtering Hubs with bad RTT - yes
Using the lowest RTT - no

Greetings
Jan

Re: Use RTT information when routing queries

PostPosted: 10 Dec 2010 01:27
by kaffeemonster
After thinking a little bit about the high RTT, a question old_death:

Where the high RTT connections by any chance compressed connecitons, esp. Hub->Client?

Compression (also in the Shareaza Code) accumulates packets by a timer to not flush the compressor after every packet.

Greetings
Jan

Re: Use RTT information when routing queries

PostPosted: 11 Dec 2010 20:17
by old_death
Kaffeemoster, just download Quazaa and have a look at the neighbours tab to see what times we are talking about here. Most RTT times are much higher than the 250ms you measured (probably using ping -> IP) as difference between near and far locations.

Also, again, using RTT is only for firewalled leaves that need all their search packets to be routed threw one of the Hubs they are connected to. This is not about any other network member and will therefore not split the network or any similarly stupid thing. Sure, some few clients will rather be using a Hub that is more near to their physical location than this is the case now, but this is not a generally bad thing, as it reduces the bandwidth costs for the datastream. What's more important though is that by choosing the Hub with the lowest RTT, searches will become faster (and no, this doesn't make the clients switch Hubs all the time, as we're not talking about switching Hubs at this step, but about choosing which Hub of the 2 (or 3) is connected to ATM should be used for routing a search (as this generates most of the traffic on Hubs)).
This means that the "effect of a stupid swarm" as we're not actually switching and we're not going to switch while performing the search. Also, if the RTT data is only updated once a minute on each client, this effect is being prevented, too.

As for dropping Hubs that have an RTT value that is "too" high, I think this is a second step to be done. Additionally to the method to select the most intelligent neighbour for routing searches (only firewalled leaves), all leaves should drop connections to Hubs (and this is the first time we're actually talking about switching Hubs) whose RTT times are higher than a certain limit. However I don't think we should impose a hardcoded limit here. it sounds more logical to me to let each network member "learn" what are the average RTT times on the network and drop connections to those Hubs whose RTT timings do exceed the network average by a certain factor (2x, 3x, 4x, ... best value needs to be determined). Like this, leaves do adapt dynamically to changes of the network, instead of relaying on a client update if changes to the value seem necessary.