Algorithms » History » Revision 23

« Previous | Revision 23/35 (diff) | Next »
Greg Burri, 08/11/2009 11:46 PM


Word indexing

Theses steps are valid independently of the language.

  1. Each file is split in word, there is a set of characters which will use as delimiter between the word like [' ', '#', '.', '?',..].
  2. Some character are replaced by a another, for example : ['é', 'ë',..] => 'e'. The goal is to remove all accent.


For a functional description see here : Functional definition

The match of a word can be partial from the beginning, for example train will match training.

This schema depicts how the results are sorted from one peer. Each peer result are then merged.

Peer ID

Each peer owns a peer id which is unique and generated during the first start. This ID is used to identify a peer, it's better than the previous usage of peer IP, considering this situation :

  • A put in queue a file entry f from B, B doesn't know the hashes of this file entry.
  • B change his IP address.
  • A want to download f, it can ask B for the hashes even B's IP changed.

Core threads

There are three kind of threads in the core in addition to the main thread :

Updating the file cache

Here is the algorithm for the thread (FileManager::FileUpdater) which will periodically update the file cache and persist it.

There is a prototype for the watcher here : Watcher Prototype

D : The set of shared directories
T : Time during which the hashes are computed (for example 30s)
F : A set containing file with unknown hashes, initially empty
W : the directory watcher

// First synchronize (at start)
For each d in D (recursively) :
   - Add d to W
   - Synchronize physical folders and files with d content
   - Add in F the files which don't have hashes

Loop :   
   t :
   For each f in F :
      - Compute the unknown hash of files f
      - Remove f from F
      If ( - t) > T : break
   - Wait for changes for a period of (if F is empty then INFINITE else 0)
      - When a modification occurs synchronize the file/folder
      - Add each new file in F
   - Persist the entire cache in a file (only every ~30min)


See here : Protocol_core-core

Updated by Greg Burri almost 13 years ago · 23 revisions