Developers.Idea.PacketDesign
Simple Packet Design
This is a design for simple packets in a peer-to-peer network. Its goals are:
- Efficiency: Get messages across using the fewest bytes possible.
- Simplicity: Make it easy for a program to parse the information.
- Extensibility: Have a built-in way to change things and add features that doesn't wreck everything.
Efficiency must be taken to the extreme. A developer reviewing this specification should think, There is no possible way to do this in fewer bytes. Simplicity will let packet parsing code be small, fast, uniform, and free of bugs and security-related issues. A reviewing developer should think, This will be really easy to parse. Extensibility ensures that even if this design meets the first two criteria, they won't be lost the first time a new feature is added or something needs to be changed. The reviewer's thoughs here must be, I could build my thing on top of this.
Some of the ideas of the design are:
- Instead of binary complexity, how about just a few short lines of text?
- There is one single method of parsing.
- Labels don't belong on the wire.
- Keep text as text and binary as binary.
- Never encode or compress anything.
- Never tell a computer something it will ignore.
- Packets can't be extended. To change something, define a new one.
These will become clear in the explanation below.
Text Lines and Binary Blocks
Packets are often documented using several lines of text to represent their binary contents. Why not just make this the packet? It's smaller for the wire and simpler for the programmer. So, in this design, a packet is made up of lines. A line is a string of single-byte characters that ends with a zero byte, \0. The program can parse lines very easily. Also, no characters need to be escaped.
<source lang="c"> this is a line\0 </source>
Intermixed with lines are blocks of bytes. Below is a representation of 8 bytes. This data can be anything.
<source lang="c"> BBBBBBBB </source>
Packets and the Packet Definition Document
Each packet has a short name like p, p2, or m. The packet starts with this name on a line, and then has more lines and binary blocks. There is no header, binary or otherwise. A packet looks like this:
<source lang="c"> p\0 BBBBBB a file name.mp3\0 128\0 BBBBBBBBBBBBBBBB </source>
Months ago, developers wrote a packet definition document. They put it on a Web site somewhere, and everyone read it while programming their clients. The document tells how many lines a p packet has, what the first, second, third, and fourth mean, where binary blocks appear, how long they are, and what they mean.
This lets the packet consist of values only. They are not labeled. Naming the values is not necessary because the packet definition document contains this information instead.
Labels belong in the packet definition document, not on the wire. The document only gets sent across the Internet when a developer programming a client browses to it. Network communication happens constantly as millions of clients exchange billions of packets.
Variable Lengths
In an instance when there are an unknown number of lines or an unknown number of bytes, these lengths are described with a number written in ASCII numerals and placed alone on a line. Here is an example with 3 colors:
<source lang="c"> 3 red blue green </source>
The zero bytes that end each line aren't shown in this and later examples. If we had 4 colors to tell another computer about, this part of the packet would look like this:
<source lang="c"> 4 red blue green purple </source>
The program knows from the packet definition document that this line indicates the number of lines afterwards, and reads them accordingly.
This method also works for bytes. Right in the middle of all this text, it's easy to transfer any length of binary data without any text encoding.
<source lang="c"> 56 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB </source>
The meaning of a count is defined by the packet definition document, and does not have to be a number of lines or a number of bytes. Imagine a computer wants to express 3 IP addresses and port numbers, each of which take 6 bytes:
<source lang="c"> 3 BBBBBB BBBBBB BBBBBB </source>
Here, there is a \0 after the 3, because it is a line, but no extra bytes after that. The IP addresses and port numbers occupy a contiguous block of 18 bytes. The packet definition document indicates that, in this case, 3 means that three 6-byte-long IP address and port number pairs follow.
A number can count sets of lines. This example has 2 sets of three lines. The meaning of the three lines is written in the packet definition document.
<source lang="c"> 2 BBBBBBBBBBBBBBBB image1.jpg 101 BBBBBBBBBBBBBBBB image2.jpg 202 </source>
Combinations of these methods can make the packet format as complex as it needs to be to express its message.
Providing More Information
Within a packet, there is no way to provide additional or optional information that some clients can use while others will safely ignore. Packets are rigidly defined. But, extensibility is as easy as defining a new packet. Imagine the p packet is a GUID, an IP address and port number, and a name.
<source lang="c"> p BBBBBBBBBBBBBBBB BBBBBB my name </source>
There is no way to chage this. It would break existing clients. So, to add a new line, just define the packet p2.
<source lang="c"> p2 BBBBBBBBBBBBBBBB BBBBBB my name new information </source>
In the handshake, clients tell each other which packets they understand. A client could impliment both p and p2 to be able to talk to updated and legacy neighbours. A standards body ratifies packet names and formats, and publishes the packet definition document.
Every part of the bitstream is used. If a client can't understand p2, the text new information is not transmitted across the wire. Bandwidth is too precious to send data that a remote computer might not be able to read.
Rules
That is a complete description of the packet format. To keep to this standard and achieve its goals, however, there are a lot of mistakes to avoid and things that are against the rules.
Take Everything to the Margin. Imagine a computer wants to tell its neighbour that it has 2 colors, red and blue. It might be tempting to write this in a single line:
<source lang="c"> colors: red, blue </source>
This is simple to read, but is actually very complicated and exception-prone for code to parse. First, the program must look for a colon, and then later split a string on any number of commas. The values red and blue have spaces around them that need to be trimmed. If a color needed to contain a space, colon, or comma, that character would have to be encoded and decoded. Also, colors is a label, and should be in the packet definition document, not on the wire. So, according to the design in this spec, this information would be formatted on multiple lines:
<source lang="c"> 2 red blue </source>
The packet definition document tells us that we are reading colors here. The first line, 2, tells us that there are two of them next. Each is on a separate line, making it easy to parse and keeping encoding unnecessary.
No Data Labels. Data labels are in the packet definition document, and never sent across the wire. In the example above, colors is a label. It appears in the bad example, and is absent from the good one.
If a computer wants to answer 3 questions yes, no, or unknown, followed by a size, here is the wrong way to do it:
<source lang="c"> Question 1: Yes Question 2: Unknown Question 3: No Size: 123 </source>
Instead, use lines that have the answers but not the questions. A blank line will not change how the packet is parsed. If a size, name, or answer is unknown, a blank line is a good way to express this. It takes up just 1 byte on the wire.
<source lang="c"> y
n 123 </source>
It's also possible to pack 8 yes, no, or unknown values into 2 bytes. The first has bits set where there is an answer in the second.
<source lang="c"> BB </source>
Send Numbers as Text. In the program, we have an int with the value 22. It takes up 4 bytes of space. We might be tempted to send it across the wire as an int:
<source lang="c"> BBBB </source>
But it would be better to conver it into ASCII and send it on a line:
<source lang="c"> 22\0 </source>
As data, 4 bytes are used. As text, only 3 are used. Also, as data, the number can't be bigger than an int. As text, it has no such limitation.
Send a Character in a Line. It might be tempting to put a char into the datastream:
<source lang="c"> c </source>
Don't do this. The receiving computer will have to cast it to char to read it, and casting is bad. Also, it's not very extensible. We could specify t in its place, but not c2.
Making it a line uses just 1 more byte and avoids all of these problems:
<source lang="c"> c\0 </source>
No Enumeration Bytes. It might seem efficient to define byte 0x00 to mean something, 0x01 to mean something else, and 0x03 to be the third possibility in an enumeration data type. Then, we'd write the byte to the wire
<source lang="c"> B </source>
Use a line instead. It is easier to read and parse, just one byte longer, and far more extensible.
<source lang="c"> a\0 </source>
No Binary Types or Binary Structures. No Casting. Don't send the bytes of a C int, a C char, or a C struct across the wire. Read out all the contents, covert each to a line or a block of bytes, and send them. This way, the program on the other end will be able to use the standard parser to read each part. Sent as a binary type or structure, the program would have to cast the bytes against that type or structure and start dereferencing pointers into it. This breaks the margin rule above.
It also shows a preference to legacy platforms. Memory, pointers and structures might seem convenient in C and C++, but are not native to modern languages and platforms like C# and Java.
It also represents a security risk in unmanaged code. Malformed data could take advantage of a mistake in code like this and gain control of the remote computer.
No Text Encoding. In a line, the only character that cannot appear is the null terminator, \0. Nothing else will interfere with the way lines are parsed. This means a line can hold a lot, including:
- Anything that can be in a C-style null-terminated string
- A set of HTTP headers, with \r\n byte pairs sprinkled throughout
- An entire XML document
This makes encoding completely unnecessary. Don't convert spaces into %20. Don't worry about \r\n or other special characters.
No Byte Encoding. When transmitting bytes, transmit them as bytes. Don't convert them into ASCII characters and send the text across the wire. The bytes are smaller, and converting them to and from text is extra work for the programmer and computer on both sides. Here is the wrong way to transmit a GUID:
<source lang="c"> guid: d51dff817f895598ff0065537c09d503\r\n </source>
Here is the right way:
<source lang="c"> BBBBBBBBBBBBBBBB </source>
If something can be expressed as bytes or text, express it as bytes. We might be tempted to express 3 IP addresses in a line like this:
<source lang="c"> More-Computers: 1.2.3.4:5, 6.7.8.9:10, 11.12.13.14:15\r\n </source>
But, doing it according to the rules of this design will make it smaller on the wire and much simpler for code to parse:
<source lang="c"> 3 BBBBBB BBBBBB BBBBBB </source>
No Additional Compression. From the handshake forward, the entire data stream is probably Zlib compressed. Therefore, it is needless extra work to compress parts of packets. A line in a packet might be a huge string of XML metadata. Don't compress it into a binary block. The Zlib compression on the wire will take care of it.
No Big Lengths. When a variable number of lines or bytes follow, a line tells how many are next. These lengths must always describe what is immediately next, and never express a total length of something bigger. A packet designer might be tempted to use the second line to express the entire packet length in bytes:
<source lang="c"> p 563 a line of information BBBBBB some more info 3 red green blue </source>
A program does not need this packet length total to read the packet. It can follow along from the packet definition document, parsing it line by line. So, the total length is unnecessary.
The problem with the length arises when it is different from the lines. The length is either a mistake, or trying to hint that there is some extended information beyond the packet that the program might know how to read. Code starts totalling the known packet size and subtracting it from the given size. It makes a big mess of moving pointers forwards and backwards to scan the packet from the outside edges towards the center. This cannot be allowed.
No extended information is allowed outside the packet definition, and by not allowing total packet sizes, it's not even possible to hide some. To extend a packet, name a new one and keep data in separate lines on the margin.