Shareaza

by **old_death** » 03 Dec 2010 23:52

It might be intelligent to implement some code that allows us to use the GPU for certain tasks, such as file hashing, (UDP) packet parsing or maybe even security filtering...

by **ailurophobe** » 06 Dec 2010 03:05

File hashing performance is limited by file I/O. TigerTree and ED2K could be parallelized for GPU but doing so would actually drop I/O performance if the file is read from a conventional hard drive.

I am not sure how you would use GPU to parse UDP packets and since most people have internet connections slower than their CPU not sure why either?

You could use the GPU to test lots of rules in parallel, but indexing the rules so that fewer need to be tested is much faster and makes using GPU redundant.

by **raspopov** » 06 Dec 2010 05:04

by **ailurophobe** » 06 Dec 2010 17:40

I forgot about Shareaza needing to calculate multiple hashes in parallel. Offloading calculation of the "leaf" hashes for TigerTree, AICH (normal ED2K has too large a chunk size), and their BT equivalent (the name of which I have forgotten

) might actually be useful. Useful enough to be worth coding? No idea... SSDs are becoming more common and while the integrated GPU in the Sandy Bridge is reputed not to support OpenCL, the next iteration of Intel iGPUs might. A certain high profile and high margin computer manufacturer whose name starts with "A" is big on OpenCL and quite annoyed at Intel for integrating GPUs that do not support it.

by **old_death** » 08 Dec 2010 15:37

The idea behind this is that HDD read speeds are growing very fast ATM, due to the development of SSD drives. And I think that it can be expected that in 5-10 years or so, at least 20% of the new computers are equipped with one of these drives. Also, already all PCs equipped with a dedicated graphics card are able to use OpenCL or CUDA and it is to be expected that integrated GPUs will be able to do so within the next 5 years, too.

Also, I think that Shareaza should be using upcoming leading technologies - and not always be 2 steps behind what everyone else is doing.

by **kaffeemonster** » 09 Dec 2010 00:38

I'm not so sure on this one.
It's a lot of Hype.

The problem with GPU processing is the Overhead. When you have to transfer a little bit of data and then all those 1000 Shader ALU do their magic (computing intensive, floting point foo, or simply massiv parrallel) on it IN PARRALEL, GPUs are great.

But when you have to shovel around a lot of data (the file data, and bringing it to the GPU may mean the Driver makes some malloc/mmap/memcpy behind your back to fit it to some DMA restrictions (Data has to be aligned at a Page boundery, what not)), and then do something inherent NON PARALLEL (Hashes have the nasty habit that their calculation is serial, earlier bytes influence the calculation in later bytes...), with much basic ops (and/or/xor/shift), GPUs with their 500 Mhz are a loss. On another important crypto op (table lookups (sboxes)), they suck badly (GPU are made for lot of throughput, not latency).

Yes, they have 1000 Shader ALU, but only one Shader can do work on one hash at a time, so you need to do several hashes (lets say 64Kb chunks of different files) at the same time.
Because a lot of Shader run the same Programm at the same time (Workgroups), doing the different Hashes is not the best route. In effect you are left with 64 additional Individual compute units, on a top-of-the-line card, not those integrated/middle class stuff. At this Point it Consumes 160W and makes a lot of Noise.

Maybe someone should write an OpenCL App which tries to take the SHA1 of n files at once and messure it, so we would have the speed for 1, 2, 4, 8 files.

Greetings
Jan

by **ailurophobe** » 09 Dec 2010 01:02

The I/O overhead would be punitive, not even SSD are designed to work that way, better to experiment with the tree structured hashes (that I mentioned) that actually might show some benefit.

Good point about the DMA (and driver) overhead. For relatively simple algorithms like all current hash algorithms are, it might be difficult to get enough speed up to make up for the overheads. Sandy Bridge (and AMD's Fusion) are reputed (IIRC AnandTech got few hours with a Sandy Bridge CPU) to have much lower overheads so when majority of CPUs comes with integrated GPU core this might be more practical. Honestly I doubt there is any point doing this on a discrete GPU.

So maybe drop this discussion until CPUs with OpenCL capable GPUs become available?

by **kaffeemonster** » 09 Dec 2010 02:07

Yes, TigerTree could be interresting, upload N Meg of File data in Burst and then unleash a lot of GPU-"Threads" on every 1K (or whatever it is).

What may ruin your Party: Tiger is a Hash specially written for 64 Bit HW. It uses 64 Bit internally for everything. It performs OK on 32 Bit normal CPU (which have support for larger size Integer, like the carry bit, an shift over two register, etc.).
But i do not know what's the integer Size on GPUs is, and if they cope well with over size (double Prescion Float is an Extention, they are VLIW and SIMD, but when it comes to integer, i guess work with 32 Bit units, prop. without carry and 64 Bit shift support, Pixel data mostly comes in 32 Bit...).

The OpenCL Compiler may Emulate 64 Bit, but it could be veeeery slow.

Greetings
Jan

Shareaza

Use OpenCL or CUDA for hash calculation

Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Re: Use OpenCL or CUDA for hash calculation

Who is online