Algorithms » History » Revision 26
Revision 25 (Greg Burri, 08/27/2009 04:26 PM) → Revision 26/35 (Greg Burri, 08/27/2009 04:27 PM)
h1. Algorithms
h2. Word indexing
Each shared file and directory are indexing by its name which are splitting in words. A search will be based on this index.
See the associated prototype here for some performance measures : [[Word index prototype]].
h3. Word splitting
Theses steps are valid independently of the language.
# Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..].
# Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent.
# All characters are converted to lower case.
This process is also valid for the user input when searching.
h3. Structure
A simple tree structure is used like this :
<pre>
struct Node {
QChar letter;
QList<Node*> children;
QList<T> itemList;
};
</pre>
h3. Searching into the tree
The match of a word can be partial from the beginning, for example _train_ will match _training_.
# Match each character of the searched word against the letter of the children of the root node recursively
## If the match failed the result is empty
## Else The result is all items into the sub tree
In the worst case the number of iteration is _N*36_ for a latin alphabet (a-z, 0-9). Where _N_ is the number of character of the searched word.
h3. Searching among other peers
For a functional description see here : [[Functional definition#The-search-window]]
# The words are sent to each peers (see the message _Find_ here : [[Protocol core-core]])
# Each peer will do a search for each word into its index
# The results for each search will be merged according the schema below
!aybabtu_search.png!
_This schema depicts how the results are sorted from one peer. Each peer result are then merged._
h2. Peer ID
Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation :
* _A_ put in queue a file entry _f_ from _B_, _B_ doesn't know the hashes of this file entry.
* _B_ change his IP address.
* _A_ want to download _f_, it can ask _B_ for the hashes even _B_'s IP changed.
h2. Core threads
There are three kind of threads in the core in addition to the main thread :
* [[Protocol_core-core#Downloading-thread|Downloading thread]] : @DownloadManager::ChunkDownloader@
* Uploading thread : @UploadManager::Uploader@
* [[Algorithms#Updating-the-file-cache|Updating file cache thread]] : @FileManager::FileUpdater@
h2. Updating the file cache
Here is the algorithm for the thread (@FileManager::FileUpdater@) which will periodically update the file cache and persist it.
There is a prototype for the watcher here : [[Watcher Prototype]]
<pre>
D : The set of shared directories
T : Time during which the hashes are computed (for example 30s)
F : A set containing file with unknown hashes, initially empty
W : the directory watcher
// First synchronize (at start)
For each d in D (recursively) :
- Add d to W
- Synchronize physical folders and files with d content
- Add in F the files which don't have hashes
Loop :
t : Time.now
For each f in F :
- Compute the unknown hash of files f
- Remove f from F
If (Time.now - t) > T : break
- Wait for changes for a period of (if F is empty then INFINITE else 0)
- When a modification occurs synchronize the file/folder
- Add each new file in F
- Persist the entire cache in a file (only every ~30min)
</pre>
h2. Downloading
See here : [[Protocol_core-core#Downloading-threads]]