Algorithms » History » Revision 33

Revision 32 (Greg Burri, 09/30/2009 08:43 AM) → Revision 33/35 (Greg Burri, 03/04/2011 02:52 PM)

h1. Algorithms 

 h2. Word indexing 

 Each shared file and directory is indexed are indexing by its name which are splitting in words. A search will be based on this index. 

 See the associated prototype here for some performance measures : [[Word index prototype]]. 

 h3. Word splitting 

 Theses steps are valid independently of the language. 

 # Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent. 
 # All characters are converted to lower case. 
 # Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..]. 

 This process is also valid for the user input when searching. 

 h3. Structure  

 A simple tree structure is used like this : 

 struct Node { 
    QChar letter; 
    QList<Node*> children; 
    QList<T> itemList; 

 h3. Searching into the tree 

 The match of a word can be partial from the beginning, for example _train_ will match _training_. 

 # Match each character of the searched word against the letter of the children of the root node recursively 
 ## If the match failed the result is empty 
 ## Else, the result contains all items from the sub tree 

 In the worst case the number of iteration is _N*36_ for a latin alphabet (a-z, 0-9). Where _N_ is the number of character of the searched word. 

 h3. Searching among other peers 

 For a functional description see here : [[Functional definition#The-search-window]] 

 # The words are sent to each peers (see the message _Find_ here : [[Protocol core-core]]) 
 # Each peer will do a search for each word into its index 
 # The results for each search will be merged according the schema below 

 _This schema depicts how the results are sorted from one peer. Each peer result are then merged._ 

 h2. Peer ID 

 Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation : 

 * _A_ put in queue a file entry _f_ from _B_, _B_ doesn't know the hashes of this file entry. 
 * _B_ change his IP address. 
 * _A_ want to download _f_, it can ask _B_ for the hashes even _B_'s IP changed. 

 h2. Core threads 

 There are three kind of threads in the core in addition to the main thread : 
 * [[Protocol_core-core#Downloading-thread|Downloading thread]] : @DownloadManager::ChunkDownloader@ 
 * Uploading thread : @UploadManager::Uploader@ 
 * [[Algorithms#Updating-the-file-cache|Updating file cache thread]] : @FileManager::FileUpdater@ 

 These threads exist only for treating IO in a parallel manner. The goal is not the computation nor robustness. 

 h2. Updating the file cache 

 Here is the algorithm for the thread (@FileManager::FileUpdater@) which will periodically update the file cache and persist it. 

 There is a prototype for the watcher here : [[Watcher prototype]] 

 D : The set of shared directories 
 T : Time during which the hashes are computed (for example 30s) 
 F : A set containing file with unknown hashes, initially empty 
 W : the directory watcher 

 // First synchronize (at start) 
 For each d in D (recursively) : 
    - Add d to W 
    - Synchronize physical folders and files with d content 
    - Add in F the files which don't have hashes 

 Loop :    
    t : 
    For each f in F : 
       - Compute the unknown hash of files f 
       - Remove f from F 
       If ( - t) > T : break 
    - Wait for changes for a period of (if F is empty then INFINITE else 0) 
       - When a modification occurs synchronize the file/folder 
       - Add each new file in F 
    - Persist the entire cache in a file (only every ~30min) 

 h2. Downloading 

 See here : [[Protocol_core-core#Downloading-threads]]