Page 1 of 1

Hash slow and freeses program

PostPosted: 12 Feb 2010 15:44
by gearsaddict
I have windows 7 on my computer. I installed your newest version and when it gets to making hashes for music from my computer during set-up it takes forever and freezes the program. I have done all removal of files as instructed in you" most common problems and questions" thread. I have tried downloading each of the 2 versions (installed regular version, uninstalled removed all files. Then did it again with optimized version). just 100 files took 5 hours, and each file to hash was extremely slow and then froze the program when doing so(showed program as "not responding"). When I had your previous versions with windows 98 and vista I have had no problem with Shareaza .

Please help!
Thank you in advance. :D

Re: Hash slow and freeses program

PostPosted: 13 Feb 2010 15:20
by ailurophobe
Random guess: A file access problem. Maybe an anti-virus set to scan files every time they are accessed? Easy to check, so worth trying.

Re: Hash slow and freeses program

PostPosted: 05 Mar 2010 07:00
by martok-sh
Hey,
I believe that the same phenomenon is occurring at my site (Shareaza 2.5.1&2 on Win2k).

I added another rather big folder to my library, thus causing Shareaza to start hashing. In fact, it does, but it's incredibly slow and taking 100% CPU. Additionally, Process Explorer shows little to no I/O-activity, suggesting that Shareaza does not really do anything.
There is no OnAccess-Scanner involved; even if that was the case, versions prior to (or maybe including, not so sure) 2.5.0 have not shown this.

I dug a little further and checked with FileMon from Sysinternals what is actually happening there; my findings would explain everything ;)

After a file is done hashing, the results get written into the library.dat and Shareaza.db3 files. At the same time, Shareaza does a full recursiv walk over all shared directories, writing a GUID into an NTFS ADS for every directory. Strange, but only takes <1 second.
I guess this triggers a reload of the library itself; something is causing another recursive walk over all shared directories. This time, every FILE is opened, queried for information and closed. Maybe more processing happens internally, since opening the files alone would not cause 100%CPU. This step takes ages.

Having a sufficiently large library (~30k files) can cause a single tiny file (~100kb) to take about 1 minute to hash and do a full library "reload".

Any ideas what could be the reason?
cu
Martok

EDIT: 24 hours later, the first 2000 files are done... only 2 weeks to go :roll:

Re: Hash slow and freeses program

PostPosted: 06 Mar 2010 23:32
by martok-sh
Okay, I think I got it. Using the TRACE output generated by the debug version, I found out that the rescan starts after CThumbCache::Store.

Since that function causes a Library.Update(), this in turn causes CLibrary::ThreadScan and for some reason CLibraryFolders::ThreadScan with bForce==true.

Relevant code has been changed in revision 8241. I believe some of this code to be causing the behaviour described here.

Since a single update can take as much as 45 seconds, the 30 second limit implemented there can easily be exceeded; yet this would show why not so many people are experiencing this: it requires a large library, and also many changed files at once.

Can someone with more insight to the code have a look at this?

Re: Hash slow and freeses program

PostPosted: 12 Mar 2010 21:25
by branko-r
This is a very interesting analysis. The problem has been discussed before in the thread "Shareaza super-slow when scanning/processing large library" https://sourceforge.net/apps/phpbb/shareaza/viewtopic.php?f=3&t=432. Unfortunately, I don't think it has been introduced recently - it is present since at least 2.4.0.0.

My experience is excessive disk writes rather than high CPU usage. As a programmer, I'd guess that something's seriously wrong with library handling. It is dead easy to test this: just try to add 50,000 files into the library and see what happens. Reading Shareaza's source is difficult for non-involved developers - it is very tidy, but also almost devoid of comments. So, I'd say it's developers' turn now...

Re: Hash slow and freeses program

PostPosted: 14 Mar 2010 04:44
by martok-sh
Interesting, I didn't see that thread in the first place ;)

If that is really the case, then we might have a problem here. I am sure it worked for the 2.2. series, but didn't come in that situation in 2.3. times, and since Shareaza doesn't have a portable mode, it would be rather difficult to downgrade just for testing.

I'd say that code need way more TRACEs, could be helpful. What struck me was the extreme modularization and slightly unpredictable module names :roll:

I'll be checking which versions changed the library since 2.2 release... at least by commit messages.

EDIT: posting my findings here:
{guess nothing happened before, since I used 2.2.5.4 for ages w/o problems}
r5589 release 2.2.5.4
r5636 release 2.2.5.5
r5706
r5848 change monitor
r5860 added GUIDs (those that are written after hashing a file)
r6348 release 2.3.0.0
r6356
r6542 release 2.3.1.0
r6880 doesn't look like it breaks anything
r6947 introduces Library.Update into Thumb creation
r6961
r7029
r7033
r7055
r7337
r7414 release 2.4.0.0, and we've seen the problem already existed there.

So far r6947 seems to be the only revision actually doing something to the library adding logic. Maybe not the hashing is problematic, but thumbnail generation and associated code.
Guess it's really time for the guys who know their code by heart to have a look on it. It looks strange to me to have a reload there, yet I'm not sure if checking for more files to hash *before* calling .Update here wouldn't break anything important.

Re: Hash slow and freeses program

PostPosted: 18 Mar 2010 19:16
by branko-r
You're right, the problem has almost certainly nothing to do with hashing. This is obvious when one tries to add, say, 1000 images to the library. It's still terribly slow, although hashing an image is probably 1000x faster than hashing a video. Also, speed appears to be dependent on the size of the library. This suggest that insertion of new entries is extremely inefficient for large libraries. There must be a way to work around this - it's on the developers to try and see what can be done.

Re: Hash slow and freeses program

PostPosted: 19 Mar 2010 12:11
by grey-hame
Another day, another part of Shareaza caught red-handed using bubble sort. *yawn*

Re: Hash slow and freeses program

PostPosted: 20 Mar 2010 19:56
by ailurophobe
There actually is something wrong with the thumbnail logic. Specifically FFDSHOW does not seem to be compatible with it. I added Shareaza.exe to the list of applications FFDSHOW does not allow and things got lot better. This is probably an FFDSHOW issue so I didn't even report it as a bug (I hate thumbnails anyway), but people using FFDSHOW and having performance and stability problems should probably try adding "shareaza.exe" to "Don't use ffdshow in" list in ffdshow video decoder configuration