Developers.camper.HowFileFragmentsAreHandled

From Shareaza Wiki
Revision as of 20:07, 20 June 2009 by Kevogod (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

File Fragment Handling

Shareaza uses the concept of fragments to express and use the fact that files that are transfered over a p2p network are not sent and received in one piece but in several independent parts. Therefore a file, while being transfered, will appear as a collection of fragments and only if all those fragments cover the entire file then this file is complete. Furthermore, different networks and hashing algorithms impose different constraints on what parts of a file can be transfered in one piece and what parts can be subject to piece verification.

An implementation shall be:

  • Efficient - File fragment handling is needed intensively in time-critical parts of the program. So high execution speed is paramount
  • Extensible - Future implementations may require additional information be stored in order to allow more sophisticated decissions to be made. For example, a download strategy using rarest-piece-first requires that information about piece distribution is accessible without effort
  • Flexible - A flexible design will allow the implementation to be used in different contexts without modification.
  • Portable - A future release of Shareaza might be available on different platforms using different compilers, therefore the implementation shall be 100% standard conformant, sections that cannot - or cannot yet - be made portable shall be separated from the main body in order to reduce the effort needed to port.

Here is a brief desciption of the current implementation. More information can be found in the asscociated header files.

All classes and functions are located in the namespace FF (short hand for FileFragments - this might have to be changed later), they are accessible after including a single header "FileFragments.hpp".

A file fragment is characterized by the following attributes:

  • begin, end - these denote the begin and end respectively of the range the file fragment represents, the underlying integer type to represent them is a parameter of the class template, which is useful with respect to flexibility. See below.
  • payload - This template parameter represents the additional information a fragment is required to carry. It is required to be of class type. If no additional information is to be stored, an empty class (a class which doesnt declare any data or methods) is used. File fragment inherits privately (instead of layering through membership) from that class in order to make use of the empty base class optimization. The payload class must fulfill the STL container requirements.

For all valid fragments end >= begin holds true, in addition end != begin is always true if the associated flag in the payload traits class is set accordingly. Trying to construct an invalid fragment causes a bad_fragment exception to be thrown (currently disabled in favour of creating an error msg in the debug log).

The header declares a specialisation of the file fragment template - SimpleFileFragment - which is the only kind of fragment currently in use (begin and end are unsigned 64bit integers, Payload is empty).

Currently, file fragment are used in two different contexts:

  • Message queues that are sent or received over the network. Currently, this is implemented only for SimpleFileFragment - using that queue is straightforward, fragments are added using pushBack and removed with popFront or erase (erase erases all fragments in the queue that are contained entirely in the parameter, all other fragments are shortened if they overlap with the parameter partially).
  • List-like structures that hold information about all fragments of a particular file without respect to its use. Unlike in the queue, all fragments in a list are ordered and disjoint. all operations on a list are guaranteed to maintain that invariant. Lists can be parameterized by the fragment they are to hold and the underlying container type to be used - in order to meet the complexity guarantees, usable containers are such that they are equivalent to standard sets with respect to speed. If not all the complexity guarantees have to be satisfied, other container types may be used (for example a vector can be used if a list is constant). With sets as the underlying container type, the list guarantees logarithmic execution for the following operations (all of which were done in linear time before):
    • insertion (constant time with insertion hint)
    • deletion (constant time with iterators)
    • find

In order to maintain the container invariants, upon insertion a list identifies all present fragments that can be merged with the new fragment and passes that range to a special function in Payload traits, since the resulting fragment(s) may depend on the payload (for empty payloads any merging results in a single fragment, that may not be true in other cases).

All operations on a list make the weak exception safety guarantee: in the presence of an exception the list will be in a valid if unpredictable state, in addition some operations are guaranteed to not throw (in particular, swap never throws).

Since a list is used to map on a particular file with known size, there is a limit up to which a list accepts fragments to be inserted or erased. this limit has to be supplied at construction time (or use swap later, this eliminates the need to decide what to do if the limit changes and fragments are present that do not satisfy that new limit; such a decision should be left to the user). When trying to insert or erase a fragment that does not satisfy that limit (end <= limit), an exception is thrown (currently disabled in favor of an error message in the debug log).

In order to use a particular file fragment with a list, the user must provide a specialisation of PayloadTraits which defines two methods needed to participate in list operation. One is used to determine whether two fragments, that define adjacent ranges (end1 == begin2), are able to be merged; the other produces a list of fragments that are the result of merging a fragment list with a particular fragment - see the implementation for empty fragments for details.

List provides most guarantees of a standard container (the greatest difference is the weaker exception safety guarantee). It can be used with standard non-mutation algorithms. Therefore its interface has been designed to be close to that of a standard container - unfortunately that leads to duplicates, so that interface may have to be dropped in the future.