Project

General

Profile

Algorithms » History » Version 24

Greg Burri, 08/27/2009 04:15 PM

1 1 Greg Burri
h1. Algorithms
2
3 20 Greg Burri
h2. Word indexing
4 1 Greg Burri
5 24 Greg Burri
Each shared file and directory are indexing by its name which are splitting in words. A search will be based on this index.
6
7
See the associated prototype here for some performance measures : [[Word index prototype]].
8
9
h3. Word splitting
10
11 20 Greg Burri
Theses steps are valid independently of the language.
12
13
# Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..].
14 1 Greg Burri
# Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent.
15 24 Greg Burri
# All characters are converted to lower case.
16 1 Greg Burri
17 24 Greg Burri
This process is also valid for the input when searching.
18 20 Greg Burri
19 24 Greg Burri
h3. Structure 
20 1 Greg Burri
21 24 Greg Burri
A simple tree structure is used like this :
22
23
<pre>
24
struct Node {
25
   QChar letter;
26
   QList<Node*> children;
27
   QList<T> itemList;
28
};
29
</pre>
30
31
32
h3. Searching
33
34 3 Greg Burri
For a functional description see here : [[Functional definition#The-search-window]]
35 15 Greg Burri
36 1 Greg Burri
The match of a word can be partial from the beginning, for example _train_ will match _training_.
37
38 24 Greg Burri
# The words are sent to each peers (see the message _Find_ here : [[Protocol core-core]])
39
# Each peer will do a search for each word into its index
40
# The results for each search will be merged according the schema below
41
42 1 Greg Burri
!aybabtu_search.png!
43
_This schema depicts how the results are sorted from one peer. Each peer result are then merged._
44
45 24 Greg Burri
46
47 5 Greg Burri
h2. Peer ID
48
49 7 Greg Burri
Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation :
50 5 Greg Burri
51
* _A_ put in queue a file entry _f_ from _B_, _B_ doesn't know the hashes of this file entry.
52 1 Greg Burri
* _B_ change his IP address.
53
* _A_ want to download _f_, it can ask _B_ for the hashes even _B_'s IP changed.
54 24 Greg Burri
55
56 5 Greg Burri
57 8 Greg Burri
h2. Core threads
58
59 9 Greg Burri
There are three kind of threads in the core in addition to the main thread :
60
* [[Protocol_core-core#Downloading-thread|Downloading thread]] : @DownloadManager::ChunkDownloader@
61 8 Greg Burri
* Uploading thread : @UploadManager::Uploader@
62 11 Greg Burri
* [[Algorithms#Updating-the-file-cache|Updating file cache thread]] : @FileManager::FileUpdater@
63 5 Greg Burri
64 22 Greg Burri
h2. Updating the file cache
65 10 Greg Burri
66
Here is the algorithm for the thread (@FileManager::FileUpdater@) which will periodically update the file cache and persist it.
67
68 23 Greg Burri
There is a prototype for the watcher here : [[Watcher Prototype]]
69
70 10 Greg Burri
<pre>
71 1 Greg Burri
D : The set of shared directories
72 22 Greg Burri
T : Time during which the hashes are computed (for example 30s)
73
F : A set containing file with unknown hashes, initially empty
74
W : the directory watcher
75 13 Greg Burri
76
// First synchronize (at start)
77
For each d in D (recursively) :
78 22 Greg Burri
   - Add d to W
79 14 Greg Burri
   - Synchronize physical folders and files with d content
80 22 Greg Burri
   - Add in F the files which don't have hashes
81 13 Greg Burri
82 10 Greg Burri
Loop :   
83 1 Greg Burri
   t : Time.now
84 13 Greg Burri
   For each f in F :
85 1 Greg Burri
      - Compute the unknown hash of files f
86 10 Greg Burri
      - Remove f from F
87
      If (Time.now - t) > T : break
88 22 Greg Burri
   - Wait for changes for a period of (if F is empty then INFINITE else 0)
89
      - When a modification occurs synchronize the file/folder
90 13 Greg Burri
      - Add each new file in F
91 12 Greg Burri
   - Persist the entire cache in a file (only every ~30min)
92 10 Greg Burri
</pre>
93
94 5 Greg Burri
95 4 Greg Burri
h2. Downloading
96
97
See here : [[Protocol_core-core#Downloading-threads]]