Developers.Code.CBuffer

From Shareaza Wiki
Jump to navigation Jump to search

CBuffer

Working with memory in C++ is hard. First, you have to allocate memory, and make sure you request the right amount. Allocate too little, and you risk corrupting your program as it runs. Allocate too much, and you risk running the computer out of memory. Then, you have to remember to free the memory when your program is done with it. Forget this, and you will run the computer out of memory.

Shareaza has a class called CBuffer which makes working with memory a lot easier. Create a CBuffer object, and call Add to copy any amount of data into it. The object will automatically allocate the right amount of memory. It even allocates a little more than it needs so every call to Add won't require a slow reallocation. When the object goes out of scope, CBuffer automatically frees the memory it was using.

CBuffer has additional methods to help with text and compression.

Member Variables

A buffer is a space of memory in the computer that a program can read from and write to. Shareaza uses a class called CBuffer to manage buffers. Three important variables define a buffer:

<source lang="c"> BYTE* m_pBuffer; // The block of allocated memory DWORD m_nLength; // The number of bytes we have written into the block DWORD m_nBuffer; // The size of the allocated block </source>

The pointer m_pBuffer points to the start of the memory block. Use this pointer to read the contents of the buffer. There are two sizes: m_nBuffer is the size of the block of allocated memory, while m_nLength is the number of bytes we've written there. Here are what these look like graphically:

File:Bufferdiagram.png

To read the data in a CBuffer object, move to m_pBuffer and look at the m_nLength bytes there. To add more data to a CBuffer object, the Add method starts writing at m_pBuffer + m_nLength. It has m_nBuffer - m_nLength bytes of space there.

Constructor and Destructor

The constructor sets pointers to NULL and sizes to 0. The destructor calls free to release the allocated memory block.

Add and Insert

The Add method takes a pointer to memory somewhere else, and the number of bytes there. It copies this memory into the buffer. The Insert method does the same thing, except it also takes an argument called nOffset. Instead of putting the new memory at the end, it inserts it at this position in the buffer, shifting whatever is beyond that forward.

Both the Add and Insert methods contain this line of code:

<source lang="c">

  1. define BLOCK_SIZE 1024
  2. define BLOCK_MASK 0xFFFFFC00

m_nBuffer = ( m_nBuffer + BLOCK_SIZE - 1 ) & BLOCK_MASK; </source>

This bumps the value of m_nBuffer up the next largest multiple of 1024. For instance, this line of code keeps 0 at 0, but turns 1 into 1024. The values of 1023 and 1024 both map to 1024, while 1025 jumps up to 2048. 1024 bytes of memory is 1 KB of memory. So, this line of code makes sure that the memory block will be large enough and a multiple of kilobytes of memory.

Why doesn't CBuffer always allocate exactly the amount of memory it needs? Because calling realloc is slow. By allocating up to a kilobyte of more space than necessary, the Shareaza code can add bytes to a CBuffer object repeatedly without it having to bother Windows for more memory.

Remove and Clear

The Remove method takes a number, and removes that number of bytes from the start of the buffer. This is useful if you've finished reading those bytes, and want to move on to the ones after them. Clear clears the entire buffer. Both of these methods don't change the allocated size of the memory block. Clear doesn't even have to manipulate any memory at all. It just records that there are no good bytes written here.

Letting a Function Write Memory

Somtimes, you won't be the one writing data into a CBuffer object. Instead, a Win32 API function will be doing it. You might not even know how many bytes are going to be written. This happens frequently in Windows programming. It's even right here in the Print method. The basic format of this interaction looks like this:

<source lang="c"> // Find out the required buffer size, in bytes, for the translated string int nBytes = WideCharToMultiByte( ... NULL, // No output buffer given, we just want to know how long it needs to be 0,

// Make sure the buffer is big enough for this, making it larger if necessary EnsureBuffer( (DWORD)nBytes );

// Convert the Unicode string into ASCII characters in the buffer WideCharToMultiByte( // Writes 5 bytes "hello", does not write a null terminator after that ... (LPSTR)( m_pBuffer + m_nLength ), // Tell it to write at the end of the data in the buffer nBytes, // There is at least this much space there

// Add the newly written bytes to the buffer's record of how many bytes it is holding m_nLength += nBytes; </source>

This pattern of code is very common in C++ Windows programs. It has 4 parts:

1. Give the API function NULL and 0, it will return how much space it needs 2. Allocate this much space 3. Call the API function again with the memory pointer and size count 4. Find out how much memory the API call actually wrote

Here, the Win32 API call that is doing the writing is WideCharToMultiByte. We call it twice, first just to find out how much memory it needs, and then with a buffer big enough for it to use. CBuffer makes steps 2 and 4 easier. To complete step 2, just call EnsureBuffer(size). The CBuffer object will grow to have enough space to accept that many more bytes. To complete step 4, add the number of bytes written to m_nLength.

Why doesn't WideCharToMultiByte just allocate a block of memory and return it to us? It may not be able to. Programs each have their own private memory space, and WideCharToMultiByte in Windows can't access ours to allocate some space for us. Even if it could, such a design would be confusing and dangerous. A program might call WideCharToMultiByte and not realize it's returning memory and remembering to free it is now our responsibility. Using it carelessly would create a memory leak that would be very difficult to find amongst thousands of lines of code.

Print

The Print method is overloaded with two versions. One takes a LPCSTR, which is a long pointer to a constant string of ASCII text. The other takes a LPCWSTR, which is a long pointer to a constant string of wide Unicode characters. Both methods write the text as ASCII into the buffer. The Unicode one has to convert it to ASCII first.

If you call Print with the ASCII string literal "hello", it will write 5 bytes into the buffer. Even though the string has a 6th byte, a null terminating zero, it doesn't get written into the buffer.

AddBuffer

AddBuffer moves the memory from another CBuffer object into this one. It copies it in, and then removes it from the given buffer.

AddReversed

AddReversed takes a memory pointer and size, and adds the bytes to this buffer. But, it does it in reverse order. The last byte in the given range is copied in first, and the byte right at the given pointer goes in last.

Prefix

Prefix uses Insert to put some ASCII text in at the start of the buffer.

EnsureBuffer

The EnsureBuffer method takes a size. It makes sure the buffer has enough space to accept this number of bytes.

Text

ReadString takes a maximum number of bytes to examine at the start of the buffer. It reads these bytes as though they are ASCII characters, and copies them into a CString object, and returns it.

ReadLine scans the buffer for the bytes \r\n. It clips out the ASCII characters infront of the pair, and returns them in a string. If bPeek is FALSE, the default, ReadLine removes the text from the buffer. Shareaza is compiled for Unicode, and the text in the buffer is ASCII. ReadLine uses MultiByteToWideChar to convert the ASCII text into wide characters for the CString object the method will return. By default, it uses the CP_ACP code page.

StartsWith takes ASCII text. The caller usually supplies it as an ASCII text literal, like "hello", instead of _T("hello"), which would consist of wide characters for the Unicode compile. The method determines if the buffer begins with this ASCII text, and returns TRUE or FALSE. The buffer doesn't have to have a null terminator beyond the text bytes - only the characters are matched. If bRemove is true and the text is found, the method removes it from the buffer.

Sockets, Compression, and Refactoring

The CBuffer methods Receive and Send each take a handle to a socket. Receive calls the Windows Sockets 2 API recv to pull all the data from the socket into the buffer. Send calls send to send the contents of the buffer into the socket.

The CBuffer methods Deflate and Inflate use the zlib library to compress and decompress the data in the buffer. CNeighbour also uses zlib, but for stream compression. This isn't stream compression. This is the kind of compression the program does when it has all of the data at once. Ungzip removes a header from the data, and then decompresses it the same way.

Looking at the last 4 methods, I'm not sure if this is the right design. CBuffer is about memory, and so is everything on a computer. CBuffer is so fundamental that every part of Shareaza could become a CBuffer method. Instead of pulling sockets and compression into CBuffer, I think these features should be in their own classes with names like CSocket and CCompress. This way, the program would treat CBuffer like CString - as a simple, fundamental data type that can be passed from method to method. Overall, the current design of Shareaza has a strong CBuffer that draws other parts of the program into it. I think a good refactored design would have a weak CBuffer, letting other parts of the program stay together.

ReverseBuffer

The method ReverseBuffer reverses all the bytes in the given memory. ReverseBuffer is defined as static. This means you don't have to have a CBuffer object to call it. It's just like a C function. You just call it, like CBuffer::ReverseBuffer().

DIME

CBuffer has two methods, WriteDIME and ReadDIME, that deal with the DIME standard for Web services. Just as MIME lets text e-mail contain binary attachments, DIME sends binary attachments through Web services. The specification Sending Files, Attachments, and SOAP Messages Via Direct Internet Message Encapsulation describes the binary header these methods read and write.