Preliminary debug data indicates that my identification of the two culprit threads was correct. I managed to provoke a ten-second-long livelock by moving files around inside my library, while a very busy search was active, and noted the bottommost message in the network tab (set to DEBUG detail level) during the freeze. When it "thawed" I copied the whole log and grepped it, and found the location.
The very next two messages (likely generated in the instants before the livelock, but so shortly before that the window didn't get repainted between them and the livelock's onset) were:
1. Hashing: $file path$
2. Processing query acknowledge from $IP$ (time adjust + 2 seconds): $blather$
Messages were generated during the freeze, at an exponentially dropping rate as more threads joined the developing logjam. The end of the freeze can be estimated by my own sense of time and by when the message rate abruptly jumps from low and slowing to a normal rate again. The first messages at that point, following a three-second period with
no messages:
1. Sending push request for download $unimportant$
2. Processing query acknowledge... (x2)
3. Got bogus hit packet... (x4)
4. Received a malformatted query hit packet...
5. Got bogus hit packet...
6. Received a malformatted query hit packet...
7. Sending query for ...
8. Hashing completed: $same file as before$
So, no hashing messages during the freeze. A rash of CNetwork:OnRun thread messages and one hashing message instantly after the freeze. And I checked, and there were no processing query anythings or hit packet anythings during the freeze either.
That's the smoking gun: the CNetwork jobs queue stalled and the hashing thread stalled, and they seem to be the threads that stalled first, before anything else stopped contributing to the log.
Now please for the love of Christ fix this bug already?
