Developers.Hash.TigerTree.Ability
How TigerTree works
With a traditional hashing system like SHA1, you give CSHA some data, and it returns a hash. So, you can tell if your file is valid or corrupted. If it's valid, you're done. But what if it's corrupted? If it's a very big file, it will take a long time to download again. And, the remote computer that gave you the bad part of it might strike again. It would be better if there were some way to tell what part of the file was bad. Then, you could fix that part. This is what TigerTree lets us do.
Imagine we've got a 4 KB file composed of 1024 ASCII 'a' characters, then 1 KB of 'b's, then 1 KB of 'c's, then 1 KB of 'd's:
<source lang="c"> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa... bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb... cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc... dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd... </source>
TigerTree breaks the file into 1 KB chunks. Here, there are 4 chunks. Then, it uses a leaf hashing algorithm to hash each chunk. In the picture below, the 4 chunks are represented by S1, S2, S3, and S4. The leaf hashing algorithm is shown by the function LH(). The 4 leaf hashes this algorithm calculates are named A, B, C, and D.
<source lang="c">
ROOT=IH(E+F) / \ / \ E=IH(A+B) F=IH(C+D) / \ / \ / \ / \ A=LH(S1) B=LH(S2) C=LH(S3) D=LH(S4)
</source>
With each individual 1 KB chunk hashed, TigerTree then builds a tree on top of them. Here, it uses a different hashing algorithm, called the internal hash and shown above as IH(). To make the tree, TigerTree grabs neighbouring hashes together and hashes the hashes. It keeps doing this until it gets to a single internal hash at the very top. This is the TigerTree root hash.
TigerTree uses it's own hashing algorithm, called Tiger. It doesn't use MD4, MD5, or SHA1.
The server has the complete file and hashes it. It wants to tell the client the hash. It can send just the ROOT hash, or it can send the entire tree. The root hash is the size of one internal hash, which is 24 bytes. The complete tree is sent as all the hashes in a row. For this 4 KB file, that's 7 hashes:
ROOT E F A B C D
With the root hash and a downloaded file, we can tell if the file matches the hash. If it doesn't, the root hash alone isn't enough information to tell where in the file the bad part is located. We request the whole tree. To make sure the tree is good, we hash it bottom to top and make sure we get the same root hash. Then, we hash from the file to the bottom of the tree to find the bad part of the file.
Shareaza sends the root hash on a query hit. It can deliver the entire tree, which is 9 levels deep, on a special request. It can't send just part of a tree. Shareaza sends trees up to 9 levels deep, even if a file is very large and actually has a bigger tree.
Questions
- Internal hashes are 24 bytes. How big are leaf hashes?
- How are leaf and internal hashes computed using the Tiger algorithm? Are any code bits put on front of either?
- The whole tree is smaller for smaller files, right?
- When we say 9 levels deep, does that include the root? In the picture above, is that tree 2 or 3 levels deep?
- How big a file can be covered with Shareaza's maximum 9 level deep tree?
- How big is the whole tree that is 9 levels deep for a very large file?