Page 1 of 1

Suddenly being blocked by DC++ hub. How to fix?

PostPosted: 16 Jan 2016 01:24
by Lanigiro
About 20 minutes ago, Shareaza abruptly fumbled several of its G1 connections at once (at least two of them) and also its DC++ connection to the hub 212.117.181.4.

After that, I have been completely unable to get it reconnected to the DC hub. It keeps saying the connection is refused. I tried turning DC++ off and waiting 15 seconds several times before trying to reconnect. That usually works when Shareaza gets stuck in a repeating loop of trying to connect to the same hub too soon after the last attempt, being refused, trying again only a quarter-second later instead of after a decent backoff interval, being refused again, etc., but this time it failed, several times in a row. I concluded that at the moment that it fumbled the G1 connections it didn't merely also fumble the DC one but did something the hub considered to be particularly bad, enough that the hub has banned my IP for an extended period of time.

So I did what I usually to to evade an undeserved (I didn't do anything wrong, even if some bug in Shareaza did) IP block: closed Shareaza, switched router off, waited 15 seconds (enough, in my experience, to be sure my old IP gets reassigned to another customer of my ISP), switched router on again, and restarted Shareaza. IP is not only different afterward, but not even in the same class-A netblock (first octet different) -- and DC++ still won't connect.

It looks like the hub has banned the entire address space used by my ISP and not just the individual IP I had when Shareaza barfed all over it.

Which means that a bug in your software just caused that hub to block about 2 million people, probably for at least an hour and possibly indefinitely.

As far as I am concerned a bug with consequences that serious, potentially cutting off access to a server by people that aren't even using the software with the bug, constitutes a show-stopping bug that should be given at least as much priority as one that merely causes the one user to experience an app crash. Not only did I end up having to restart Shareaza, but I am still blocked from using that hub, and so are 2 million other people, some of whom might have been using other, better-behaved DC clients to reach it. That's pretty effing bad!

The DC support in Shareaza still feels beta quality. It's not crashing Shareaza and the goofy, un-usable "0/1 sources" download-list entries have gone from ridiculously common to extremely rare, but I do still see the infrequent instance of this glitch, rendering some file un-downloadable (it gets stuck locally queued and can't be resumed from the Shareaza end), and it still has the problem of too-frequent reconnection attempts if it loses a DC hub connection. There is also some unknown problem that causes it to get itself banned from hubs, not very often, but for lengthy intervals. The 212 hub seems to ban for several hours when this happens, and it seems to ban entire ISPs when it happens, based on previous incidents. Likely I won't be able to use it again until tomorrow, and no thanks to the shoddy implementation of DC++ in Shareaza, neither will any other customer of my ISP, even if they use other clients. Xumku's Khimki Quiz hub banned my ISP a year ago and as of a few weeks ago, the last time I tried to connect to it again, it was still banned.

All of these DC++ bugs need to be fixed, but whichever one causes these lengthy bans to get enacted against entire internet service providers easily is the Public Enemy No. 1 amongst them. It joins just two other issues on my personal "FBI's Most Wanted" list of Shareaza bugs, the other two both being application hangs (the flatline one while hashing when G2 search results are arriving has now been observed lasting over ten minutes one time, and the 100%-CPU-use one that seems to be linked especially to G1 ultrapeer connectivity loss, and whose duration appears to scale with the number of items in open search result windows, sometimes happens in a rapid-fire manner that can lock the user out of the UI for nearly as long).

Unfortunately, diagnosing the DC++ one is likely to be very difficult, as there is little in Shareaza's logging data accompanying it other than a bunch of "dropped the connection unexpectedly" messages when it strikes. This is by contrast to the two hanging bugs, where one occurs when G2 results arrive simultaneous with files finishing or starting hashing due to library changes and is some sort of deadlock/livelock, and the other seems to always be accompanied by a log entry saying Shareaza is banning an ultrapeer for being "Foxy", whatever that means. (Neither hang should occur; locking order should be consistent for any pair of locks, and there is no reason why blocking a single peer from being used again as a Network tab hub should involve more than a tiny bit of CPU time to add it to a list somewhere, let alone an amount that scales with hub-connection-irrelevant things like the number of search results in open windows.)

This much is known about the bug that causes lengthy DC bans.

1. It strikes often, but doesn't usually trigger a DC ban, just a disconnection, albeit sometimes manual nursemaiding is required to reconnect because of the too-soon-retry-loop mentioned above.

2. The bug causes Shareaza to drop multiple nominally-independent peer connections all at the same moment, regardless of geography, protocol, or other factors; a random subset of all of the hub connections in the Network tab will "drop the connection unexpectedly", despite the affected subset of connections not having any routing in common past my own ISP's border routers. At the same time, failures in my ISP or closer to home, but outside of Shareaza, are unlikely to be responsible, as any such would likely either hit all of the connections, or all of some specific protocol, or none of them, or exactly one of them.

3. When it strikes, sometimes Shareaza does something to mightily offend one or more of the affected hubs. This can likely result in bans from all four networks, but this isn't noticeable for G1 and G2, where "plenty of fish in the sea" and interchangeability of these numerous hubs apply. It definitely can result in bans from ED2K and DC++ hubs, but the ED2K bans always seem to be single-IP bans, so a restart of the router with a decent disconnected interval suffices to fix it, and the bans are clearly temporary or most of the IPs at my ISP would no longer be able to connect to the eMule Security servers after this has been going on for years. It is the DC bans that are a problem, because unlike Lugdunum, the software used to run DC hubs seems to enact very broad bans in response to ... well, whatever it is that Shareaza does when this happens.

4. My best guess, and that's all it can be on the data I have thus far, is that when the multi-connection-killing bug strikes, what actually happens is that Shareaza sends a bunch of essentially line noise out to its hubs, which usually just looks like invalid/corrupt data or a bad protocol implementation. Hubs of all four protocols commonly just ignore the bad data and wait for good data, and the other common response is to just drop the connection. Bans are much rarer, and probably depend on either the duration (if it triggers flood/DoS protections) or the detailed content of the "noise" (if it randomly happens to sufficiently resemble some known malicious traffic pattern), with a long or accidentally virussy-looking emission triggering a border firewall lockout of the offending IP address (mine!) for hours or longer, or in the case of DC++, of the entire ISP owning the offending IP address.

I could of course spend hours poring over the networking part of the source code looking for the problem (as I did to find the deadlock for you, which you still have not fixed even though I even proffered a patch, in the hashing subsystem), but it would probably take someone intimately familiar with the source code much less time than it would take me to find any given bug in it. Recommendation: go over the parts of the code that generate leaf-to-hub traffic with a fine toothed comb; also look in every function with automatic arrays that are used as buffers to hold to-hub packets for memory errors (off-by-one loops over arrays, pointer arithmetic bugs, etc.), which might result in overwriting some of the contents of a to-hub-packet buffer without setting off Windows memory protection and producing a clean app crash; and also check where currently active hub connections' IPs may travel, in case they are inadvertently being added somewhere else by some bug (e.g. as sources for files they're not really sources of, perhaps even sources of a different protocol than they are -- how would a DC++ hub respond to getting a G2 file-download request or query hit packet sent randomly to it?); it's almost guaranteed that the bug is of one of these three types.