Page 1 of 2

Constant G2 "lack of traffic" errors - can't stay connected

PostPosted: 08 Feb 2014 00:25
by Winterkeeper
I have been using Shareaza since last August with no problems whatsoever. I only use it to connect to G2.

However, during the last week or so I find myself constantly unable to stay connected to G2 neighbors. The connection is established fine but within a few minutes, the bandwidth on a neighbor will drop to 0 and then the connection is dropped with the error: Closing connection to neighbour x.x.x.x due to lack of traffic. or Neighbour x.x.x.x dropped the connection unexpectedly. Though, most often it is the "lack of traffic" error.

This happens over and over, and the most I am able to stay connected to a neighbor is around 10 minutes. Some drop after 1-2 minutes. Before, I would connect to neighbors with no problems and stay connected to them for hours.

I haven't made any recent changes to the computer that is running Shareaza and I'm experiencing no other network problems. I started with Shareaza 2.6.0 and upgraded to 2.7.1 in December but it was working fine. Again, this is only happening during the last week.

Any ideas on what might be causing this?

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 08 Feb 2014 05:10
by raspopov
90% of network problems caused by connection/hardware errors - check network card connectors, try to ping some hosts for example "ping -t google.com", you can also test bandwidth by http://www.speedtest.net/ site etc.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 09 Feb 2014 01:20
by Winterkeeper

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 09 Feb 2014 07:36
by raspopov
1. Too short connection timeouts.
2. Too high speed settings.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 10 Feb 2014 02:24
by Winterkeeper
I've tried increasing the timeout settings, and now I've actually had one G2 neighbor stay connected for more than 90 minutes so far. However, the second neighbor connection still constantly drops every few minutes. I guess I will keep experimenting with it.

I should note that when someone downloads from me, there are no problems at all. They stay connected for the entire time and get the complete file, even in the case of large files. It's only the G2 neighbors that are dropping, and usually with that "lack of traffic" error.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 10 Feb 2014 05:11
by dew
I'm experiencing the same issue.

If you look at the network crawler http://crawler.doxu.org/history.html there was definitely something weird that happened 9 days ago.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 10 Feb 2014 14:31
by Lanigiro
Add me in as a third user who's been seeing this.

Those graphs are alarming, for three reasons.

1. The strong diurnal oscillation suggests that G2 has weak uptake outside of one or a few time zones, instead of strong global support. (This is odd in light of the geographic diversity of hubs I routinely see, but maybe hubs are more evenly distributed than leaves.)

2. There appears to have been a huge and recent drop in the total population of hubs alright. It's as if millions of peers suddenly cried out in terror, and were suddenly silenced. But by which evil empire? The RIAA, the MPAA, or someone else? The hubs also have fewer leaf connections, which suggests the hubs that survive are having problems staying online and keep crashing for some reason. Was an update pushed out for a popular client around that time? I know it's been longer since the last Shareaza version bump (2.7.0.0 -> 2.7.1.0).

3. The year-long graph at lower right has perhaps the worst news of all. There seems to be an overall exponential decay in the size of G2 over that time period, with a half-life of actually just about one year. G2 should be growing, not shrinking. What is going on here?

In the meantime I can report that whatever the recent change is has also impacted G1, as I use it routinely. G1 is much harder to connect to than it used to be, and worse a lot of the failed attempts time out rather than failing quickly. The amount of timeouts and refuseds suggests that the G1 GWebCaches are heavily laden with stale entries for IPs not presently running G1 clients (and, given the prevalence of timeouts, mainly IPs that aren't even presently in use). Either a popular G1 client is also experiencing problems sustaining uptime, or the GWebCaches themselves are acting up (or being poisoned by some bad actor). This may have started at the same time as the G2 anomaly.

Once established G1 connections seem maybe a bit more unstable than in the past, but not nearly as much so as G2. But it's now both G1 and G2 that Shareaza connects to very slowly, with G1 spending a long time on stale webcache entries before finding a good ultrapeer and G2 banging its head against the Great Firewall of China, by the looks of it, for as much as half an hour before finally connecting to seemingly one single small population of hubs in France.

At this point I'm finding most of my stuff via eMule, which used to be a distant second to G2.

Ten minute CPU saturations every hour

PostPosted: 10 Feb 2014 17:15
by Lanigiro
Is anyone else seeing this? For the past few days, my install of 2.7.1.0 keeps saturating my CPU for almost precisely 11-minute-long blocks almost exactly once every 52 minutes, so there's a just over 40 minute gap from the end of one to the start of the next incident.

The CPU saturations are accompanied by numerous "network core overloaded" messages in the network tab, with lower than normal (NOT higher than normal) traffic. And when the CPU saturation ends, Shareaza instantly drops every single hub connection it has save for the ED2K server.

Putting the network tab in "debug" shows something even more remarkable. Usually there's a steady flow of yellow, green, and blue background traffic, particularly if G1 is connected (tons of "Multicast querying Gnutella1 neighbors"). When the CPU surge starts, all of this stuff just stops, abruptly. Not a single event that is not shown at "info" or above apparently happens, or if it does it is no longer reported to the network tab. Far from "network core overloaded", it seems the network core is virtually idle during these events.

I mention this because it only started happening recently and is NOT the result of an update I made (I switched to 2.7.1.0 much earlier than this started). Not only that, I think its onset may coincide with the anomalous G2 "lack of traffic" problem being discussed in another thread here, which suggests a possible cause: that Shareaza G2 hubs are afflicted by this same phenomenon, and drop their hub-to-leaf connections or simply starve them of traffic when their CPU surges occur. So the unstable G2 connections we're all seeing are caused by these surges hitting the hubs we're connected to.

As for the cause, the obvious culprit would be malicious traffic. It's a DOS attack, in all likelihood.

Here is my network log (in "debug") of the onset of a surge I saw here. I have redacted an uploader IP but not any others, as each of them may be the source of the suspected malicious traffic, and most of the other IPs are hubs:

[10:41:21] Querying 180.177.218.52
[10:41:21] Processing query acknowledge from 218.250.73.14 (time adjust -53 seconds): 30 hubs,
2848 leaves, 5 suggested hubs, retry after 300 seconds.
[10:41:21] Multicast querying Gnutella1 neighbours
[10:41:21] Requesting query key from 114.43.112.157
[10:41:21] Received a malformatted query packet from 125.196.163.189, ignoring.
[10:41:22] Rejecting upload connection from [redacted], network core overloaded.

So, it queried a particular G2 hub, processed a query acknowledge, multicast queried G1 neighbors, requested a G2 query key, and received a malformatted query packet from G1. The queries are outbound and so is the G2 query key request, so they're less likely to be the triggering event. G1 malformatted query packets are a dime a dozen, on the one hand, but on the other one happened at almost the exact instant the surge began. If someone is sending out floods of malicious G1 query packets that cycle back to a particular target IP every 52 minutes it could explain what we're seeing. The other strong candidate trigger is the last query acknowledge processed before all hell broke loose, although I don't see anything hugely anomalous about it.

I might try to packet capture the G2 traffic peripheral to a surge onset to see if there's a query acknowledge right beforehand that stands out as abnormal in some way.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 11 Feb 2014 05:21
by Winterkeeper

Re: Ten minute CPU saturations every hour

PostPosted: 11 Feb 2014 16:14
by raspopov
The "network core overloaded" message means that Shareaza has been (dead)locked.

You need to run for example ProcessExplorer and find with its help a most CPU consuming thread inside Shareaza process, then press "Stack" button and find out guilty dll.

Also you can open "About" dialog box and force Shareaza crash by Shift + Right Clicking blue URL in it, then paste generated crash log here.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 11 Feb 2014 18:00
by Lanigiro

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 11 Feb 2014 19:00
by Lanigiro
A few more thoughts on the "bistable" hypothesis.

First, the graph looks an awful lot like the behavior of a bistable system nudged out of an attractor state. It wanders chaotically for a short time and then falls into an attractor again, in this instance the other attractor.

Second, one of the likelier ways for such a bug to exist is a race condition with a chance of wedging or crashing Shareaza if an inbound hub-to-hub connection enters a particular phase of the handshake while another inbound connection is still in that phase. Adding the hub to the global data structure representing the totality of established hub connections is the likely place where two threads handling separate network connections might step on each others' toes in this case.

I'd go over the hub-to-hub connection establishment code with a fine-toothed comb looking for a place where there's unsynchronized access to a shared data structure -- the established-connections list is a likely culprit, as is the host cache (newly connected hubs propagating info about hubs they know of prompting update). If a bug in that area is found, pushing out a 2.7.2.0 with the bug fixed could (eventually) fix this.

The other thing I'd look at is the system for promoting leaves to hubs -- or demoting hubs to leaves. I noticed that the onset of the problem happened while the leaf count as at its daily low, or very close to that time, and that the previous daily low had been the lowest daily low in a while. I'm wondering if a big enough daily swing in the number of leaves can trigger a destructive oscillation where a very low number of leaves per hub causes a lot of hubs to demote themselves to leaf, and then all the leaves losing these hub connections trying to make new connections trigger the remaining hubs to do whatever they do to try to elect some leaves to hubs, and the network gets stuck oscillating between "too many hubs so demote some" and "too few hubs so promote some leaves" and won't stabilize with a reasonable number of hubs. This would explain why there are surges and dips now in the number of hubs, much more than the nearly level number of hubs there used to be. But the smoking gun for this scenario, an oscillation of hub and leaf counts on a timescale of minutes, isn't visible at the crawler site. That may just be because the crawler's temporal resolution is too coarse to see it, though.

The fix in this case has an obvious short term as well as an easy long term solution. The short term fix is just to get enough people to nail their Shareazas into either hub-only or leaf-only behavior rather than let it be promoted/demoted. Preferably without ending up with too few hubs. The long term fix is to push out an update designed to damp out oscillations caused by hub promotion/demotion. Making promotion require a chronically high number of "rejected because leaf slots full" leaf rejections per minute over a longer span of time before a hub does something to try to get one or more of its leaves to become hubs (or however exactly promotion works) could do it. Hub count would climb more slowly and might level off rather than overshoot and start swinging wildly, even though the oscillations in leaf count are apparently wider than in the past for some reason, or at least reach lower low points. Another option is for hub to leaf demotion to have a refractory period: unless the node has been a hub more than X hours, it won't demote itself to leaf. Randomizing X at the time of promotion would be a good idea too. Otherwise, it will just broaden the peaks of hub count before the hub count crashes and rebounds again, slowing but not halting the oscillations. Randomizing X will make the hub count drop more slowly from its peak, and hopefully level off instead of undershoot. If there's already a refractory period programmed in, randomizing it or broadening the distribution would help. Each time a leaf becomes a hub, it should roll a random number between, say, 1 and 6 and be incapable of being demoted to leaf (other than by explicit user action) until that many hours after when it became a hub, even if it has a chronically low leaf count the whole while. The effect is that if there really are excessive hubs, the ones that rolled a 1 will drop, and then if there are now adequate hubs but not too many, the ones that rolled a 2 will have enough leaves by the 2-hour mark not to demote, and the oscillation is halted. (Making it minutes, random between 60 and 360, would be even better, as the hubs would just start dropping off slowly after the 1 hour mark until an optimum number was reached.)

There is a way to test each of the above bistability hypotheses.

Race condition: actually examine the code to see if the stuff involved in establishing hub connections touches a shared data structure without holding the mutex for that structure. Fix any instance where it does and deploy 2.7.2.0. Then wait a while to see if the problem goes away once 2.7.2.0 has broad enough uptake. If so, problem diagnosed and solved.

Deadlock and livelock: related to the above, the problem might instead be that the threading in Shareaza is prone to deadlock (wedges the hub) or livelock (temporarily jams at least some of its operations) trying to acquire mutexes. The cause would be if handling new inbound hub to hub connections acquires some particular two mutexes and holds them simultaneously and some other operation (maybe outbound hub to hub connection establishment) acquires the same mutexes in the reverse order and holds them simultaneously. Livelock instead of deadlock results if there's some slow timeout or something to eventually give up trying to get a mutex and abort the associated operation. The fix is essentially the same: fix the bug (this time by making both processes try to acquire the relevant mutexes in the same order) and push out 2.7.2.0, then see if the problem goes away.

Of course, it's possible that threading bugs like the above exist but aren't actually the cause of our present difficulties. In that case, the bugs will be found but fixing them won't fix G2. But it won't be a total waste, as bugs ought to be found and fixed regardless, and Shareaza and G2 will be more robust going forward.

Oscillatory hub election: the test for this is easy enough -- run some Shareazas (at least two) in "either hub or leaf" mode and see if some or all keep switching back and forth frequently and in something close to synchrony; say, most of the leaves become hubs within a few minutes, then after say half an hour of little change most of the hubs become leaves in only a few minutes, then after another longer period of little change most of the leaves become hubs, etc.

The fix in that case is to nail the test machines into hub mode to help stabilize the network, alert everyone here and wherever else you can to the situation and advise people with good long-lived broadband connections to go into hub-only mode until 2.7.2.0 is released and they've upgraded, and then make 2.7.2.0 with something like random-duration-set-on-promotion hub-to-leaf-demotion refractory period, test it for regressions, and release it.

If none of the above apply, then we're back to square 1 on this, though some sort of DOS attack or network poisoning then is the likeliest explanation.

Re: Ten minute CPU saturations every hour

PostPosted: 12 Feb 2014 01:26
by Lanigiro
The DLL is hal.dll. Thread using most CPU during listed that four times during but not at all after, while it showed ntdll.dll, kernel32.dll, MSVCR100.dll, and mfc100u.dll both during and after and USER32.dll only after.

Re: Ten minute CPU saturations every hour

PostPosted: 12 Feb 2014 03:57
by raspopov
Very bad for you, hal.dll is a hardware abstraction layer, so your computer is broken.

Re: Ten minute CPU saturations every hour

PostPosted: 13 Feb 2014 19:38
by Lanigiro

Re: Ten minute CPU saturations every hour

PostPosted: 13 Feb 2014 20:08
by raspopov
For getting function names instead of ordinal numbers you'll need to use corresponding debug symbol files (*.pdb) or daily build (with built-in .pdb-files).

Re: Ten minute CPU saturations every hour

PostPosted: 13 Feb 2014 21:45
by Lanigiro

Re: Ten minute CPU saturations every hour

PostPosted: 14 Feb 2014 04:15
by raspopov
Anyway it's driver problem, btw antivirus and fierwall also has a driver.

Re: Ten minute CPU saturations every hour

PostPosted: 14 Feb 2014 05:00
by Lanigiro

How do I force Shareaza to stay connected?

PostPosted: 14 Feb 2014 19:47
by Lanigiro
This is getting frustrating. The other threads related to connectivity issues all have drifted off onto tangents, or become moribund. So let's address this head-on.

1. How do I make my copy of Shareaza stay connected to G2? That means, how do I make individual connections last a "normal" amount of time (tens of hours) without interruption, AND how do I make Shareaza replace lost connections AUTOMATICALLY and IN A TIMELY FASHION? I want it to replace a lost connection, by itself, in a maximum of 60 seconds. How do I make this happen? Note: a valid solution must not compromise other functionality, for example it must not make any potential file sources unavailable for the purpose of downloading files from them.

2. How do I make my copy of Shareaza stay connected to a specific ED2K server? If I have pending downloads from "push" sources on, say, eMule Security #1, I can only potentially get the files if I stay connected to eMule Security #1. But if I connect to a particular ED2K server and go to sleep, I usually wake up to find that it's been connected to the wrong server for six or seven hours, uselessly spinning its wheels and unable to progress on any of the files I want that are from push sources on the first server. And then it often refuses to reconnect to the original server even if I explicitly tell it to using the hostcache. This behavior is incorrect. The major ED2K servers are up 24/7, never full, and long term stable, unlike ephemeral G1 and G2 hubs, so Shareaza should NEVER drop its connection to one without being instructed to by me and Shareaza should NEVER fail to connect to one on demand. Yet it keeps doing so. How do I prevent this undesired behavior? It's MY copy of the software, running on MY computer, so I have the absolute right to prevent this undesired behavior. I just lack the exact knowledge of how to enforce correct behavior here. (And am puzzled as to why Shareaza comes configured OOTB to behave incorrectly in this regard, instead of Just Working(tm).)

3. How do I make Shareaza not require constant nursemaiding and frequent restarts? As I've previously noted, a) it (usually) won't reconnect to G2 without manual assistance and b) it sometimes starts refusing to connect to particular ED2K servers. The latter behavior does not seem to go away without restarting Shareaza completely. I don't want to nursemaid it. I don't want to restart it every few hours. I want to "set it and forget it" and have it gradually acquire the files on its download list without my having to do anything but leave the machine switched on and hooked up to the Internet. Now someone please tell me how to configure it so that I CAN just "set it and forget it" as described.

Re: How do I force Shareaza to stay connected?

PostPosted: 14 Feb 2014 20:44
by Lanigiro

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 14 Feb 2014 21:00
by raspopov
Does your computer gets BSOD often?

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 14 Feb 2014 21:12
by Lanigiro

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 06:46
by raspopov
Nope, It's about your problem driver.

I can see "10 minute"-hubs too (probably) but also a normal ones. Can you research more about this case? Be a hub, play with options etc.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 07:01
by dew
I've not experienced the other problems Lanigiro mentioned, but something odd has happened to the network. The uptime graphs here http://crawler.doxu.org/uptimes.html show a drastic drop in uptime, and that does correlate with what I've seen. My neighbour connections drop within a few minutes (but reconnect with another hub after a few seconds automatically). As an experiment, I've tried several times to force my Shareaza into hub mode, but it always crashes after a few minutes.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 07:26
by raspopov
Can you send me crash report?

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 08:00
by dew
Okay, I've PM'd the error report to you. It took about 9 minutes for Shareaza to crash.

Incidentally, when this error occurred previously, I tried to submit the report through the automatic email mechanism but I received the error "There is no email program associated to perform the requested action. Please install an email program or, if one is already installed, create an association in the Default Programs control panel." I have Thunderbird installed, and in the Default Programs control panel it has all its defaults already.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 08:42
by raspopov

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 09:23
by dew
Sent the log. I'll try 32 bit tomorrow. Going to bed now :)

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 10:12
by Winterkeeper
I tried forcing hub mode on mine tonight. This is on a 32-bit dedicated server. Just to be conservative, I set it to only allow 100 leaves. I also only connected to G2 - no G1 connections. Everything was running fine until suddenly the CPU spiked to 100% just as Lanigiro talked about in the other thread, and every incoming connection was then rejected due to "network core overloaded". Then, just as Lanigiro said, after almost exactly 11 minutes the CPU spike ended and connections resumed. Of course, during that time all of my leaves had dropped. Afterwards, everything ran fine again for a short time, and then there was another 100% CPU spike with the same results. At that point I tried to disconnect from G2 and Shareaza became completely unresponsive.

I don't think it's a stretch to infer that this must be happening to a lot of hubs out there right now.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 15 Feb 2014 10:36
by raspopov
Crash Shareaza forcibly during high CPU utilization and send crash log to me please.

Re: Ten minute CPU saturations every hour

PostPosted: 15 Feb 2014 15:25
by skinvista
FYI, I have been chasing a similar problem for the last week or so in a long-time fork of Shareaza.
Old builds (a year with no problems) exhibit the same behavior (at least when installed over current), debug seems worse.

By any chance are you recently running Windows 8.1?
The only recent local change was an upgrade from Win 8 a few weeks before the problem.
"Antimalware Service Executable" for Windows Defender seems to spike Disk usage at the onset,
but that could be a symptom or unrelated.

Otherwise, some external network trigger is exploiting shared code (lack of defense)
that I haven't successfully located/fixed yet.

Re: Ten minute CPU saturations every hour

PostPosted: 15 Feb 2014 19:13
by queuesclimber
Something happend. rev. 9357 works well. no crashes, connected to all networks

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 16 Feb 2014 00:29
by dew
I tried 32-bit Shareaza now. Same results as others. It doesn't crash (ran for 3 hours). I wasn't monitoring it while running, so I don't know if there were 11 minute spikes, but when looking at it after 3 hours there was high CPU (25%, i.e., one of my cores) and dropped most connections. I force-crashed it and PM'd the crash log.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 16 Feb 2014 03:37
by dew
Earlier, I was testing hub mode. Now in leaf mode (32-bit Shareaza), I still see the CPU spikes. With 64-bit Shareaza, it ran fine in leaf mode (just the problem with neighbour connections lasting a short time).

With me, the CPU spikes are always precisely 4 minutes and 38 seconds long (I'm recording CPU usage with perfmon). They occur at somewhat random intervals between 10 - 20 minutes apart (from start of one spike to the start of the next).

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 16 Feb 2014 07:58
by raspopov
Looks like your Shareaza just overloaded by G2 packets, it spends many time in packet parsing code and incoming connections are dropped with "503 Busy" error.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 16 Feb 2014 08:18
by Lanigiro

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 16 Feb 2014 09:58
by raspopov
Try latest daily build r9358 please.

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 17 Feb 2014 00:35
by Lanigiro

Re: Constant G2 "lack of traffic" errors - can't stay connec

PostPosted: 17 Feb 2014 04:52
by Lanigiro
And now it seems to have locked up after running for about five hours. No change in CPU usage; it seems to be doing exactly what it was doing before "under the hood" to judge by the data from ProcessExplorer; but the UI is "spinning" and not updating itself all of a sudden. No apparent trigger, either. It was on the network tab when it happened and the tail of the log there looks perfectly typical.

Obviously, the changes you made didn't pan out. Something caused the CPU use during normal operation to be grossly increased, and though the CPU surges seem to be gone, it's now unstable and prone to hang outright after a while.

I think analysis of those excess G2 packets during one of the ten-minute-long "storms" will be needed to find out more about what is going on here and address the problem at its root. Simply adding some inbound packet-dropping throttle in front of the G2 subsystem, as it's implied you did, is a mere salve on the symptoms that, though it apparently works, seems to have deleterious and sometimes-lethal side effects.