Developers.Hash.Library
Watching a file get hashed by the library
Right now, I'm just trying to find or write code that takes a whole file, and computes its TigerTree hash. Here's some code that computes a lot of hashes at once:
<source lang="c"> CTigerTree pTiger; CED2K pED2K; CSHA pSHA1; CMD5 pMD5;
pTiger.BeginFile( Settings.Library.TigerHeight, nFileSize ); pED2K.BeginFile( nFileSize );
for ( QWORD nLength = nFileSize ; nLength > 0 ; ) { DWORD nBlock = (DWORD)min( nLength, QWORD(20480) ); DWORD nTime = GetTickCount();
ReadFile( hFile, m_pBuffer, nBlock, &nBlock, NULL );
pSHA1.Add( m_pBuffer, nBlock ); pMD5.Add( m_pBuffer, nBlock ); pTiger.AddToFile( m_pBuffer, nBlock ); pED2K.AddToFile( m_pBuffer, nBlock );
nLength == nBlock;
if ( ! m_bPriority && ! bPriority ) { if ( nBlock == 20480 ) m_nHashSleep = ( GetTickCount() - nTime ) * 2; m_nHashSleep = max( m_nHashSleep, DWORD(20) ); Sleep( m_nHashSleep ); }
if ( ! m_bThread ) return FALSE; }
pSHA1.Finish(); pMD5.Finish(); pTiger.FinishFile(); pED2K.FinishFile(); </source>
Why is pED2K acting like its own hashing algorithm?
Taking just the TigerTree portion looks like this:
<source lang="c"> CTigerTree pTiger; pTiger.BeginFile( Settings.Library.TigerHeight, nFileSize ); pTiger.AddToFile( m_pBuffer, nBlock ); pTiger.FinishFile(); </source>
How it works
The file is 4 KB. It is composed of 1024 a characters, 1024 b characters, 1024 c characters, and 1024 d characters.
In the Shareaza 2.0 code, the breakpoint is set at CLibraryBuilder::HashFile
Here's the code that opens the file and gets the file handle. It's in CLibraryBuilder::OnRun. It uses the CreateFile API.
<source lang="c"> HANDLE hFile = CreateFile( m_sPath, GENERIC_READ,
FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL|FILE_FLAG_SEQUENTIAL_SCAN, NULL );
</source>
Here's the call into HashFile:
<source lang="c"> SHA1 pSHA1;
if ( HashFile( hFile, bPriority, &pSHA1 ) ) </source>
The method is given a new blank SHA1 variable where it can put the hash value. bPriority is 0.
To begin, HashFile finds out how big the file is.
<source lang="c"> BOOL CLibraryBuilder::HashFile(HANDLE hFile, BOOL bPriority, SHA1* pOutSHA1) { DWORD nSizeHigh = 0; DWORD nSizeLow = GetFileSize( hFile, &nSizeHigh ); QWORD nFileSize = (QWORD)nSizeLow | ( (QWORD)nSizeHigh << 32 ); QWORD nFileBase = 0; </source>
nFileSize is a QWORD because a DWORD can't describe a size larger than 2 GB. nFileBase starts out as 0.
<source lang="c"> nSizeLow = (DWORD)( nFileBase & 0xFFFFFFFF ); nSizeHigh = (DWORD)( nFileBase >> 32 ); SetFilePointer( hFile, nSizeLow, (PLONG)&nSizeHigh, FILE_BEGIN ); </source>
nFileBase is 0. All this code does is turn that QWORD into two DWORDs, both of which are 0. Then, it calls SetFilePointer to move the pointer to this location, 0, which is the start, and is where the file pointer already is anyway.
<source lang="c"> CTigerTree pTiger; CED2K pED2K; CSHA pSHA1; CMD5 pMD5;
pTiger.BeginFile( Settings.Library.TigerHeight, nFileSize ); pED2K.BeginFile( nFileSize ); </source>
Here the code creates the objects that will compute the hashes. CMD5 and CSHA compute MD5 and SHA1 hashes, and CED2K and CTigerTree compute the other kinds. These last two are special, and require a call to BeginFile. In this test, nFileSize is 4096 for our 4 KB file. The value from settings, TigerHeight, is 9.
Next is the loop.
<source lang="c"> for ( QWORD nLength = nFileSize ; nLength > 0 ; ) {
DWORD nBlock = (DWORD)min( nLength, 20480 );
...
nLength == nBlock;
</source>
This loop splits the file size into 20 KB chunks. For a small file, there will be just one chunk. The loop will run once with nBlock set to the file size. For a 30 KB file, the loop will run twice. First, nBlock will be 20 KB. On the second loop, it will be 10 KB.
<source lang="c"> ReadFile( hFile, m_pBuffer, nBlock, &nBlock, NULL );
pSHA1.Add( m_pBuffer, nBlock ); pMD5.Add( m_pBuffer, nBlock ); pTiger.AddToFile( m_pBuffer, nBlock ); pED2K.AddToFile( m_pBuffer, nBlock ); </source>
The API call ReadFile reads the block from the file into m_pBuffer. This moves the file pointer forward, so the next time the loop calls ReadFile, it will copy the next block. From the buffer, the block is given to all 4 hashing objects.