Project

General

Profile

Algorithms » History » Version 34

Greg Burri, 03/04/2011 02:52 PM

1 1 Greg Burri
h1. Algorithms
2
3 20 Greg Burri
h2. Word indexing
4 1 Greg Burri
5 34 Greg Burri
Each shared file and directory is indexed by its name which is split in words. A search will be based on this index.
6 24 Greg Burri
7
See the associated prototype here for some performance measures : [[Word index prototype]].
8
9
h3. Word splitting
10
11 20 Greg Burri
Theses steps are valid independently of the language.
12
13 1 Greg Burri
# Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent.
14
# All characters are converted to lower case.
15 29 Greg Burri
# Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..].
16 1 Greg Burri
17 26 Greg Burri
This process is also valid for the user input when searching.
18 20 Greg Burri
19 24 Greg Burri
h3. Structure 
20 1 Greg Burri
21 24 Greg Burri
A simple tree structure is used like this :
22
23
<pre>
24
struct Node {
25
   QChar letter;
26
   QList<Node*> children;
27
   QList<T> itemList;
28
};
29
</pre>
30
31 25 Greg Burri
h3. Searching into the tree
32 1 Greg Burri
33 25 Greg Burri
The match of a word can be partial from the beginning, for example _train_ will match _training_.
34 1 Greg Burri
35 25 Greg Burri
# Match each character of the searched word against the letter of the children of the root node recursively
36
## If the match failed the result is empty
37 27 Greg Burri
## Else, the result contains all items from the sub tree
38 1 Greg Burri
39 25 Greg Burri
In the worst case the number of iteration is _N*36_ for a latin alphabet (a-z, 0-9). Where _N_ is the number of character of the searched word.
40
41
42
h3. Searching among other peers
43
44
For a functional description see here : [[Functional definition#The-search-window]]
45 1 Greg Burri
46 24 Greg Burri
# The words are sent to each peers (see the message _Find_ here : [[Protocol core-core]])
47
# Each peer will do a search for each word into its index
48
# The results for each search will be merged according the schema below
49
50 1 Greg Burri
!aybabtu_search.png!
51
_This schema depicts how the results are sorted from one peer. Each peer result are then merged._
52
53 24 Greg Burri
54
55 5 Greg Burri
h2. Peer ID
56
57 7 Greg Burri
Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation :
58 5 Greg Burri
59
* _A_ put in queue a file entry _f_ from _B_, _B_ doesn't know the hashes of this file entry.
60 1 Greg Burri
* _B_ change his IP address.
61
* _A_ want to download _f_, it can ask _B_ for the hashes even _B_'s IP changed.
62 24 Greg Burri
63
64 5 Greg Burri
65 8 Greg Burri
h2. Core threads
66
67 9 Greg Burri
There are three kind of threads in the core in addition to the main thread :
68
* [[Protocol_core-core#Downloading-thread|Downloading thread]] : @DownloadManager::ChunkDownloader@
69 8 Greg Burri
* Uploading thread : @UploadManager::Uploader@
70 11 Greg Burri
* [[Algorithms#Updating-the-file-cache|Updating file cache thread]] : @FileManager::FileUpdater@
71 5 Greg Burri
72 32 Greg Burri
These threads exist only for treating IO in a parallel manner. The goal is not the computation nor robustness.
73 30 Greg Burri
74
75
76 22 Greg Burri
h2. Updating the file cache
77 10 Greg Burri
78
Here is the algorithm for the thread (@FileManager::FileUpdater@) which will periodically update the file cache and persist it.
79
80 28 Greg Burri
There is a prototype for the watcher here : [[Watcher prototype]]
81 23 Greg Burri
82 10 Greg Burri
<pre>
83 1 Greg Burri
D : The set of shared directories
84 22 Greg Burri
T : Time during which the hashes are computed (for example 30s)
85
F : A set containing file with unknown hashes, initially empty
86
W : the directory watcher
87 13 Greg Burri
88
// First synchronize (at start)
89
For each d in D (recursively) :
90 22 Greg Burri
   - Add d to W
91 14 Greg Burri
   - Synchronize physical folders and files with d content
92 22 Greg Burri
   - Add in F the files which don't have hashes
93 13 Greg Burri
94 10 Greg Burri
Loop :   
95 1 Greg Burri
   t : Time.now
96 13 Greg Burri
   For each f in F :
97 1 Greg Burri
      - Compute the unknown hash of files f
98 10 Greg Burri
      - Remove f from F
99
      If (Time.now - t) > T : break
100 22 Greg Burri
   - Wait for changes for a period of (if F is empty then INFINITE else 0)
101
      - When a modification occurs synchronize the file/folder
102 13 Greg Burri
      - Add each new file in F
103 12 Greg Burri
   - Persist the entire cache in a file (only every ~30min)
104 10 Greg Burri
</pre>
105
106 5 Greg Burri
107 4 Greg Burri
h2. Downloading
108
109
See here : [[Protocol_core-core#Downloading-threads]]