Use RTT information when routing queries

Post comments about Shareaza code and discuss with other developers.
Forum rules
Home | Wiki | Rules

Use RTT information when routing queries

Postby old_death » 07 Nov 2010 19:02

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby ailurophobe » 08 Nov 2010 07:24

Only problem I see with this is that if picking a random hub causes a "bad distribution of routed packets" then that hub should not be a hub at all (or at least have a lower leaf count). So you are treating the symptom instead of curing the disease. Not that there is anything wrong with treating the symptom, but it would be better to make the hub aware of how it is handling the load and drop the leaf count or even hub mode entirely if the answer is "badly." But we've discussed that before, so I'll stop boring you with it.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: Use RTT information when routing queries

Postby old_death » 08 Nov 2010 13:01

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby brov » 08 Nov 2010 19:05

Which hub will be better? That one who is forwarding query hits to 90% of its leaves or the one who is forwarding hits to only one? (Assuming equal hardware and other conditions). This approach is going to equally distribute firewalled and open leaves between hubs, doing some simple indirect load balancing. Most load on hubs (excluding udp queries) is caused by forwarding hits to firewalled nodes.
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: Use RTT information when routing queries

Postby ailurophobe » 10 Nov 2010 10:36

The idea of the dynamic leaf count proposals is that the hub adds leaves until it starts affecting its performance negatively. This means that all hubs would have equal load relative to their performance. Faster machines would have more leaves and slower less. Since the relative load is what is being measured and it is "equalized", hubs with more push forwards will automatically compensate by having fewer leaves. So the balancing would happen automatically as a side effect. An RTT comparison for two such hubs would be useless since all hubs would equally capable anyway.

The point about being able to help leaves and hubs that haven't been updated is a very good one, though.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: Use RTT information when routing queries

Postby old_death » 11 Nov 2010 17:05

How about just doing both? We could perform a dynamic leaf count calculation during the quick start wizard run at first start and use that as a default value for the number of supported leaves if a computer switches to Hub mode. And via the RTT calculation, fine tuning while operating could be done.

I think we shouldn't under estimate the need for control over what happens user do have: By automatically determining how many leaves a Hub should have, users cannot control that setting manually - and some, especially the filesharing freaks (and those are who we cannot afford to loose on G2) like me for example do want this control. The more of it, the better.
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby ailurophobe » 12 Nov 2010 02:31

Doing both would probably be best. RTT helps indirectly balance (to use brovs term) even older versions, dynamic leaf count doesn't.

Incidentally, the "dynamic" means that the leaf count is continuously adjusted in run time according to the current performance. Or rather: instead of there being a maximum leaf count, there is a minimum response time limit. (And no more leaves are accepted if that limit is not met.) This minimum limit would be an extended setting people can change.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: Use RTT information when routing queries

Postby old_death » 13 Nov 2010 08:33

Maybe the dynamic number of supported leaves of a Hub should be determined once a week rather than once a minute, by using the worst score during that week as a basis for calculation. Like that, Hubs won't drop leaf connections as often as they would if they were adapting their number of leaves all the time. Remember, this would save the network a great deal of traffic, as each time a leaf has to connect to a new Hub, a big amount of data needs to be transmitted. So, the less Hub switching, the better for the overall network...

Also, like this we would probably have less of those "bad user experiences" when someone actually notices a huge impact in performance once Shareaza switches to Hub mode (which is something that should absolutely be prevented).
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby ailurophobe » 13 Nov 2010 19:12

The way I see it every hub would simply keep track of its current performance by measuring the time from query key request to the query answer and when a leaf asks for a hub connection it would check if any of the last N such exchanges was faster than a time limit and if not it would deny the connection. You'd probably also want to add a throttle to how fast the hub adds leaves. At most ten per minute or something like that.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: Use RTT information when routing queries

Postby raspopov » 13 Nov 2010 19:59

RTT useless. For example Intel i7 8Gb RAM computer with 1Gb/s bandwidth Internet connection but... connected via satellite provider has RTT above 300 ms or even greater.
User avatar
raspopov
Project Admin
 
Posts: 945
Joined: 13 Jun 2009 12:30

Re: Use RTT information when routing queries

Postby old_death » 13 Nov 2010 20:43

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby brov » 13 Nov 2010 21:08

brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: Use RTT information when routing queries

Postby ailurophobe » 14 Nov 2010 05:41

Latency actually matters for routing push requests, so using RTT would actually give a pretty good result as long as that fast but high latency system had a higher leaf (or neighbour) count than the lower latency hubs used to route the push requests do. So RTT would give good result even for that corner case if either the dynamic leaf count proposal or the super-hub proposal was also implemented. Or just add a "high latency connection" check box somewhere that makes the hub mode to have a higher than normal leaf count but drop firewalled leaves.
ailurophobe
 
Posts: 709
Joined: 11 Nov 2009 05:25

Re: Use RTT information when routing queries

Postby raspopov » 14 Nov 2010 07:21

User avatar
raspopov
Project Admin
 
Posts: 945
Joined: 13 Jun 2009 12:30

Re: Use RTT information when routing queries

Postby brov » 14 Nov 2010 14:10

What fragmentation if RTT information is used by firewalled leaves only?

And yes, I know there are many satellite providers, but, honestly I never saw a hub connected via satellite provider...
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: Use RTT information when routing queries

Postby raspopov » 14 Nov 2010 14:24

User avatar
raspopov
Project Admin
 
Posts: 945
Joined: 13 Jun 2009 12:30

Re: Use RTT information when routing queries

Postby brov » 14 Nov 2010 14:54

But not firewalled ones, right?
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: Use RTT information when routing queries

Postby raspopov » 14 Nov 2010 14:57

User avatar
raspopov
Project Admin
 
Posts: 945
Joined: 13 Jun 2009 12:30

Re: Use RTT information when routing queries

Postby brov » 14 Nov 2010 15:00

You are simply wrong here, what about TCP ping, isn't it possible?
brov
 
Posts: 87
Joined: 05 Jul 2009 12:15

Re: Use RTT information when routing queries

Postby old_death » 14 Nov 2010 19:14

User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby smokex » 15 Nov 2010 00:56

Satellite providers limit monthly thoughtput at a pretty low level. Like on Wildblue the best account maxed out at like 50Gb a month. That limit can be reached quickly and after that, additional use charges stack up to a high price pretty quickly. Therefore, users of satellite service are not big users of P2P. It is in the "acceptable use clause" or "fair use clause" of most satellite ISP contracts.
User avatar
smokex
 
Posts: 46
Joined: 13 Jun 2009 19:17

Re: Use RTT information when routing queries

Postby kaffeemonster » 08 Dec 2010 07:34

Forget the Satellites for a second, Ras was exagerating.

In essence he is right.
If you go strictly my the RTT, you lump the Network together on "short Network distances".
Example: This means that maybe South Afrika will be lumped together with Europe (because they have fast fiber up here), but Europe would be "split" from Asia (or South America, or, or).

RTT is only good to filter the most badly lagging Hubs. Drop connecitons with an RTT of say over 2s.
But do _not_ favour the lowest RTT.

Other things to improve:
1) Go to your policital representant he should bang some clue into the telecoms to do FTTH, NOW, so we all get 10 Mbit up
2) Improve Shareaza overload detection?

Greetings
Jan
User avatar
kaffeemonster
 
Posts: 9
Joined: 24 Jan 2010 21:34

Re: Use RTT information when routing queries

Postby old_death » 08 Dec 2010 16:09

That's not true: The problem is that TTL differences due to physical distances are usually smaller than 10-100ms, while differences due to packet processing etc. on the Shareaza part are usually bigger than 100ms (TTL times > 10s are can be observed at a regular basis). Until this changes, the problem of the network being split into regional subnets is not very high.

And even if it this would lead to a regional regrouping of the network, it wouldn't be bad, as we are talking only about firewalled leaves here, not about any other network member. This means that only firewalled leaves would be using Hubs with a shorter physical distance, not anyone else...
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19

Re: Use RTT information when routing queries

Postby kaffeemonster » 09 Dec 2010 00:08

Hmmm.

Asia:
64 bytes from orange.kame.net (203.178.141.194): icmp_req=1 ttl=52 time=330 ms
64 bytes from rev198.asus.com (211.72.249.198): icmp_req=1 ttl=238 time=339 ms

USA:
64 bytes from web2.eff.org (64.147.188.3): icmp_req=1 ttl=52 time=218 ms
64 bytes from forward.markmonitor.com (64.124.14.63): icmp_req=1 ttl=117 time=212 ms /* was time-warner.com */

Around the corner:
64 bytes from www.heise.de (193.99.144.85): icmp_req=1 ttl=249 time=51.1 ms
64 bytes from www.free.fr (212.27.48.10): icmp_req=1 ttl=121 time=70.1 ms

If Shareaza really needs more than 100ms to process a packet, then IMHO there is something really going wrong.
I do not have the numbers for G2CD ATM, but i think it was around 1ms...
And that was "measured" with tcpdump, so from packet entering the network card, to answer leaving the network card, but with uncongested Network!

I guess the problem is that most of the time the upload is totally congested. And that needs to takled.

Maybe by mesuring the RTT and saying: "Sorry, you are a to crowded/congested/bad Hub - bye".
But not by activly seeking the Hub with the lowest RTT.
Note the fine difference in this!

Nearby, how to you prevent the effect of a stupid swarm. All Clients mesaure a good RTT on one Hub, all now use that hub to do their relaying, because of that the Hub gets congested. All Client move on, only to congest the next Hub.
You enter a world of problems there.

So again:
Filtering Hubs with bad RTT - yes
Using the lowest RTT - no

Greetings
Jan
User avatar
kaffeemonster
 
Posts: 9
Joined: 24 Jan 2010 21:34

Re: Use RTT information when routing queries

Postby kaffeemonster » 10 Dec 2010 01:27

After thinking a little bit about the high RTT, a question old_death:

Where the high RTT connections by any chance compressed connecitons, esp. Hub->Client?

Compression (also in the Shareaza Code) accumulates packets by a timer to not flush the compressor after every packet.

Greetings
Jan
User avatar
kaffeemonster
 
Posts: 9
Joined: 24 Jan 2010 21:34

Re: Use RTT information when routing queries

Postby old_death » 11 Dec 2010 20:17

Kaffeemoster, just download Quazaa and have a look at the neighbours tab to see what times we are talking about here. Most RTT times are much higher than the 250ms you measured (probably using ping -> IP) as difference between near and far locations.

Also, again, using RTT is only for firewalled leaves that need all their search packets to be routed threw one of the Hubs they are connected to. This is not about any other network member and will therefore not split the network or any similarly stupid thing. Sure, some few clients will rather be using a Hub that is more near to their physical location than this is the case now, but this is not a generally bad thing, as it reduces the bandwidth costs for the datastream. What's more important though is that by choosing the Hub with the lowest RTT, searches will become faster (and no, this doesn't make the clients switch Hubs all the time, as we're not talking about switching Hubs at this step, but about choosing which Hub of the 2 (or 3) is connected to ATM should be used for routing a search (as this generates most of the traffic on Hubs)).
This means that the "effect of a stupid swarm" as we're not actually switching and we're not going to switch while performing the search. Also, if the RTT data is only updated once a minute on each client, this effect is being prevented, too.

As for dropping Hubs that have an RTT value that is "too" high, I think this is a second step to be done. Additionally to the method to select the most intelligent neighbour for routing searches (only firewalled leaves), all leaves should drop connections to Hubs (and this is the first time we're actually talking about switching Hubs) whose RTT times are higher than a certain limit. However I don't think we should impose a hardcoded limit here. it sounds more logical to me to let each network member "learn" what are the average RTT times on the network and drop connections to those Hubs whose RTT timings do exceed the network average by a certain factor (2x, 3x, 4x, ... best value needs to be determined). Like this, leaves do adapt dynamically to changes of the network, instead of relaying on a client update if changes to the value seem necessary.
User avatar
old_death
 
Posts: 1950
Joined: 13 Jun 2009 16:19


Return to Development Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron