Project

General

Profile

Algorithms » History » Version 28

Greg Burri, 08/27/2009 04:32 PM

1 1 Greg Burri
h1. Algorithms
2
3 20 Greg Burri
h2. Word indexing
4 1 Greg Burri
5 24 Greg Burri
Each shared file and directory are indexing by its name which are splitting in words. A search will be based on this index.
6
7
See the associated prototype here for some performance measures : [[Word index prototype]].
8
9
h3. Word splitting
10
11 20 Greg Burri
Theses steps are valid independently of the language.
12
13
# Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..].
14 1 Greg Burri
# Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent.
15 24 Greg Burri
# All characters are converted to lower case.
16 1 Greg Burri
17 26 Greg Burri
This process is also valid for the user input when searching.
18 20 Greg Burri
19 24 Greg Burri
h3. Structure 
20 1 Greg Burri
21 24 Greg Burri
A simple tree structure is used like this :
22
23
<pre>
24
struct Node {
25
   QChar letter;
26
   QList<Node*> children;
27
   QList<T> itemList;
28
};
29
</pre>
30
31 25 Greg Burri
h3. Searching into the tree
32 1 Greg Burri
33 25 Greg Burri
The match of a word can be partial from the beginning, for example _train_ will match _training_.
34 1 Greg Burri
35 25 Greg Burri
# Match each character of the searched word against the letter of the children of the root node recursively
36
## If the match failed the result is empty
37 27 Greg Burri
## Else, the result contains all items from the sub tree
38 1 Greg Burri
39 25 Greg Burri
In the worst case the number of iteration is _N*36_ for a latin alphabet (a-z, 0-9). Where _N_ is the number of character of the searched word.
40
41
42
h3. Searching among other peers
43
44
For a functional description see here : [[Functional definition#The-search-window]]
45 1 Greg Burri
46 24 Greg Burri
# The words are sent to each peers (see the message _Find_ here : [[Protocol core-core]])
47
# Each peer will do a search for each word into its index
48
# The results for each search will be merged according the schema below
49
50 1 Greg Burri
!aybabtu_search.png!
51
_This schema depicts how the results are sorted from one peer. Each peer result are then merged._
52
53 24 Greg Burri
54
55 5 Greg Burri
h2. Peer ID
56
57 7 Greg Burri
Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation :
58 5 Greg Burri
59
* _A_ put in queue a file entry _f_ from _B_, _B_ doesn't know the hashes of this file entry.
60 1 Greg Burri
* _B_ change his IP address.
61
* _A_ want to download _f_, it can ask _B_ for the hashes even _B_'s IP changed.
62 24 Greg Burri
63
64 5 Greg Burri
65 8 Greg Burri
h2. Core threads
66
67 9 Greg Burri
There are three kind of threads in the core in addition to the main thread :
68
* [[Protocol_core-core#Downloading-thread|Downloading thread]] : @DownloadManager::ChunkDownloader@
69 8 Greg Burri
* Uploading thread : @UploadManager::Uploader@
70 11 Greg Burri
* [[Algorithms#Updating-the-file-cache|Updating file cache thread]] : @FileManager::FileUpdater@
71 5 Greg Burri
72 22 Greg Burri
h2. Updating the file cache
73 10 Greg Burri
74
Here is the algorithm for the thread (@FileManager::FileUpdater@) which will periodically update the file cache and persist it.
75
76 28 Greg Burri
There is a prototype for the watcher here : [[Watcher prototype]]
77 23 Greg Burri
78 10 Greg Burri
<pre>
79 1 Greg Burri
D : The set of shared directories
80 22 Greg Burri
T : Time during which the hashes are computed (for example 30s)
81
F : A set containing file with unknown hashes, initially empty
82
W : the directory watcher
83 13 Greg Burri
84
// First synchronize (at start)
85
For each d in D (recursively) :
86 22 Greg Burri
   - Add d to W
87 14 Greg Burri
   - Synchronize physical folders and files with d content
88 22 Greg Burri
   - Add in F the files which don't have hashes
89 13 Greg Burri
90 10 Greg Burri
Loop :   
91 1 Greg Burri
   t : Time.now
92 13 Greg Burri
   For each f in F :
93 1 Greg Burri
      - Compute the unknown hash of files f
94 10 Greg Burri
      - Remove f from F
95
      If (Time.now - t) > T : break
96 22 Greg Burri
   - Wait for changes for a period of (if F is empty then INFINITE else 0)
97
      - When a modification occurs synchronize the file/folder
98 13 Greg Burri
      - Add each new file in F
99 12 Greg Burri
   - Persist the entire cache in a file (only every ~30min)
100 10 Greg Burri
</pre>
101
102 5 Greg Burri
103 4 Greg Burri
h2. Downloading
104
105
See here : [[Protocol_core-core#Downloading-threads]]