G1 still being a massive pain to connect
Posted: 15 Mar 2015 09:08
I don't notice any movement on the G1 front lately. It continues to spend most of its time missing some G1 connections, and the connections that get to the "handshaking" stage often don't complete, or don't last longer than even one minute before dropping.
There seem to be multiple overlapping problems here:
1. G1 web caches or hub discovery seems to be heavily polluted with IPs that are not G1 ultrapeers, to judge by the number of "refused"s in the network log when G1 connection is attempted. Either that, or a lot of ultrapeers are extremely short-lived, so their IPs get spread around but by the time my client tries to connect to one it's not an ultrapeer anymore and the port is closed.
2. G1 connections seem to be very drop-prone. Either that, or (again) a lot of ultrapeers are extremely short-lived and die within a few minutes of connecting to me.
3. There seems to be an issue with Shareaza where if G1 is not fully connected, short freezes keep happening. The individual freezes are accompanied by a Shareaza thread using 100% of a CPU core, plus a complete lockout of the UI and some issues with the rest of Windows at the onset of the hang (most notably, alt-tab behaving weirdly). These last go away after a few seconds, but the excessive Shareaza CPU use and the UI freeze tend to last from 20 seconds to a full minute and occasionally more. The freezes often end with more G1 connections having dropped, presumably timing out while my own machine was not responding. Why do I think that missing G1 connections cause the freezes? Because their frequency seems proportional to the number of gray (connecting) G1 entries in the network tab. If I'm missing one G1 connection there will be three of these and the freezes happen occasionally. If I'm missing two there will be six and the freezes more frequent. If I'm missing three there will be nine and the freezes more frequent still, to the point that it's difficult to do anything useful with the application without being interrupted for 20 or more seconds several times in five minutes. And if all are missing, there will be, it seems, a lot more than 12 gray G1 attempts in progress, and the UI will typically be completely unusable until it establishes a G1 connection. If G1 is disabled or is fully connected though, these particular freezes do not seem to happen at all. Thus it seems that something that occurs at some moderately low frequency (perhaps 5%) on a particular attempt to connect a G1 ultrapeer causes some thread to wedge, in a busy-wait or runaway process of some sort, but it's not a simple infinite loop -- the odd thing is that it's huge but finite, apparently involving on the order of 50 billion calculations to complete whatever the heck it is doing. The duration of the freezes also seems to get worse the older the Shareaza instance is, as though it involves it groveling over some accumulation of material. I'd have suspected some sort of garbage collector, had I not seen the lack of one in Shareaza's source code. There's also the fact that GC pauses should not be tied so strongly to a single feature of an application (G1 ultrapeer handshaking) but should occur from time to time no matter what else is going on, with frequency proportional to general activity level and not a specific subsystem's.
Needless to say, those freezes make the G1 connectivity issues rather worse than they would be otherwise, in two ways: one, it makes "not being fully connected to G1" cause worse problems than just a smaller G1 search horizon, and two, the freezes contribute to the very instability of G1 connections that is at issue here, whenever the freeze is long enough to trigger other, established connections to drop.
There seem to be multiple overlapping problems here:
1. G1 web caches or hub discovery seems to be heavily polluted with IPs that are not G1 ultrapeers, to judge by the number of "refused"s in the network log when G1 connection is attempted. Either that, or a lot of ultrapeers are extremely short-lived, so their IPs get spread around but by the time my client tries to connect to one it's not an ultrapeer anymore and the port is closed.
2. G1 connections seem to be very drop-prone. Either that, or (again) a lot of ultrapeers are extremely short-lived and die within a few minutes of connecting to me.
3. There seems to be an issue with Shareaza where if G1 is not fully connected, short freezes keep happening. The individual freezes are accompanied by a Shareaza thread using 100% of a CPU core, plus a complete lockout of the UI and some issues with the rest of Windows at the onset of the hang (most notably, alt-tab behaving weirdly). These last go away after a few seconds, but the excessive Shareaza CPU use and the UI freeze tend to last from 20 seconds to a full minute and occasionally more. The freezes often end with more G1 connections having dropped, presumably timing out while my own machine was not responding. Why do I think that missing G1 connections cause the freezes? Because their frequency seems proportional to the number of gray (connecting) G1 entries in the network tab. If I'm missing one G1 connection there will be three of these and the freezes happen occasionally. If I'm missing two there will be six and the freezes more frequent. If I'm missing three there will be nine and the freezes more frequent still, to the point that it's difficult to do anything useful with the application without being interrupted for 20 or more seconds several times in five minutes. And if all are missing, there will be, it seems, a lot more than 12 gray G1 attempts in progress, and the UI will typically be completely unusable until it establishes a G1 connection. If G1 is disabled or is fully connected though, these particular freezes do not seem to happen at all. Thus it seems that something that occurs at some moderately low frequency (perhaps 5%) on a particular attempt to connect a G1 ultrapeer causes some thread to wedge, in a busy-wait or runaway process of some sort, but it's not a simple infinite loop -- the odd thing is that it's huge but finite, apparently involving on the order of 50 billion calculations to complete whatever the heck it is doing. The duration of the freezes also seems to get worse the older the Shareaza instance is, as though it involves it groveling over some accumulation of material. I'd have suspected some sort of garbage collector, had I not seen the lack of one in Shareaza's source code. There's also the fact that GC pauses should not be tied so strongly to a single feature of an application (G1 ultrapeer handshaking) but should occur from time to time no matter what else is going on, with frequency proportional to general activity level and not a specific subsystem's.
Needless to say, those freezes make the G1 connectivity issues rather worse than they would be otherwise, in two ways: one, it makes "not being fully connected to G1" cause worse problems than just a smaller G1 search horizon, and two, the freezes contribute to the very instability of G1 connections that is at issue here, whenever the freeze is long enough to trigger other, established connections to drop.