Page 1 of 1

G1 still being a massive pain to connect

PostPosted: 15 Mar 2015 09:08
by Lanigiro
I don't notice any movement on the G1 front lately. It continues to spend most of its time missing some G1 connections, and the connections that get to the "handshaking" stage often don't complete, or don't last longer than even one minute before dropping.

There seem to be multiple overlapping problems here:

1. G1 web caches or hub discovery seems to be heavily polluted with IPs that are not G1 ultrapeers, to judge by the number of "refused"s in the network log when G1 connection is attempted. Either that, or a lot of ultrapeers are extremely short-lived, so their IPs get spread around but by the time my client tries to connect to one it's not an ultrapeer anymore and the port is closed.

2. G1 connections seem to be very drop-prone. Either that, or (again) a lot of ultrapeers are extremely short-lived and die within a few minutes of connecting to me.

3. There seems to be an issue with Shareaza where if G1 is not fully connected, short freezes keep happening. The individual freezes are accompanied by a Shareaza thread using 100% of a CPU core, plus a complete lockout of the UI and some issues with the rest of Windows at the onset of the hang (most notably, alt-tab behaving weirdly). These last go away after a few seconds, but the excessive Shareaza CPU use and the UI freeze tend to last from 20 seconds to a full minute and occasionally more. The freezes often end with more G1 connections having dropped, presumably timing out while my own machine was not responding. Why do I think that missing G1 connections cause the freezes? Because their frequency seems proportional to the number of gray (connecting) G1 entries in the network tab. If I'm missing one G1 connection there will be three of these and the freezes happen occasionally. If I'm missing two there will be six and the freezes more frequent. If I'm missing three there will be nine and the freezes more frequent still, to the point that it's difficult to do anything useful with the application without being interrupted for 20 or more seconds several times in five minutes. And if all are missing, there will be, it seems, a lot more than 12 gray G1 attempts in progress, and the UI will typically be completely unusable until it establishes a G1 connection. If G1 is disabled or is fully connected though, these particular freezes do not seem to happen at all. Thus it seems that something that occurs at some moderately low frequency (perhaps 5%) on a particular attempt to connect a G1 ultrapeer causes some thread to wedge, in a busy-wait or runaway process of some sort, but it's not a simple infinite loop -- the odd thing is that it's huge but finite, apparently involving on the order of 50 billion calculations to complete whatever the heck it is doing. The duration of the freezes also seems to get worse the older the Shareaza instance is, as though it involves it groveling over some accumulation of material. I'd have suspected some sort of garbage collector, had I not seen the lack of one in Shareaza's source code. There's also the fact that GC pauses should not be tied so strongly to a single feature of an application (G1 ultrapeer handshaking) but should occur from time to time no matter what else is going on, with frequency proportional to general activity level and not a specific subsystem's.

Needless to say, those freezes make the G1 connectivity issues rather worse than they would be otherwise, in two ways: one, it makes "not being fully connected to G1" cause worse problems than just a smaller G1 search horizon, and two, the freezes contribute to the very instability of G1 connections that is at issue here, whenever the freeze is long enough to trigger other, established connections to drop.

Re: G1 still being a massive pain to connect

PostPosted: 17 Mar 2015 13:08
by Lanigiro
I have reason to think that the G1 situation is more serious, in some ways, than originally thought.

Sometime while I was asleep last night, my Shareaza inexplicably dumped not only all of its G1 hosts but all of its G1 discovery services as well, and I woke up to find it solidly connected to G2, ED2K, DC++, and nothing else. Adding G1 discovery services from gwebcaches.pongwar.com didn't seem to fix it, so I suspected that something had internally gone wonky in the running instance and restarted Shareaza. When it came back up it promptly queried one of the re-added discovery services and connected to G1.

Almost instantly.

It's robustly connected right now, with four G1 ultrapeers stable for 20 minutes. No slow startup, no frequent connection drops, no weird CPU spikes, no other wackiness.

The contrast with the behavior of the past few months is so stark that there can be only one explanation for it: For months now, G1 has been in a state of network partition, with a part full of crappy short-lived hubs and a part full of normal ones. Shareaza must have fallen into the crappy partition somehow a while ago, perhaps on one of the occasions when I didn't use it for several days, which always requires re-querying G1 discovery services for some reason as it forgets everything in the G1 host cache if that happens. Perhaps the cache it queried belonged to the bad partition. Today it picked one belonging to the good partition.

If this is indeed the case, then something needs to be done either to heal the G1 partition or simply to amputate the crappier part of it. Finding out which discovery services apparently have hosts from which partition and adding the bad half to the banlist for Shareaza would prevent Shareaza getting trapped in the crappy half. Anything less crude will require a greater understanding of the causes and nature of this partition than I presently have.

Re: G1 still being a massive pain to connect

PostPosted: 17 Mar 2015 14:13
by Lanigiro
Well, it was nice while it lasted.

G1 started acting flaky and dicky again about an hour and a half after I'd first connected. It can't quite be a simple partition, but some sort of almost-partition where it can find its way from the good part to the crappy part but not back out. If there's a bunch of poorly-configured peers that mostly refer to each other and have frequent outages, it would explain these symptoms and would be stable: peers connected to this cluster will end up with hostcaches full of more peers in the cluster, and will keep reconnecting within the cluster and have a hard time finding their way out. The cluster will retain the property of mostly linking within itself. At the same time, URLs in the cluster may find their way into hostcaches outside of the cluster, so winding up in the cluster is relatively easy, compared to spontaneously getting back out of it. The question is how such a "clique cluster" develops within the broader G1 network in the first place. And, of course, how to fix it.

Re: G1 still being a massive pain to connect

PostPosted: 20 Mar 2015 09:00
by shareaza4ever

Re: G1 still being a massive pain to connect

PostPosted: 30 Mar 2015 23:24
by ale5000
If someone is interested I have done some work to block some G1 spam in GWC, look for the latest version of Skulls! Multi-Network WebCache.
The biggest problem that I see is that a lot of clients request hosts but almost none update them, ip are dynamic and updates to GWC are too lazy.

I need some feedback about this.

Re: G1 still being a massive pain to connect

PostPosted: 12 Jul 2015 22:26
by bmn