Project

General

Profile

Word index prototype » History » Version 6

Greg Burri, 09/10/2009 11:00 AM

1 1 Greg Burri
h1. Word index prototype
2
3 2 Greg Burri
see here : [[Algorithms#Word-indexing]]
4 3 Greg Burri
5
h2. Measure
6
7 5 Greg Burri
Each file and folder is split in words and indexed by these words.
8
9 6 Greg Burri
The item stored is an integer to simulate a pointer to a complex structure. Thus only the index memory is counted (with the item pointer). In a real usage, the item would contain a path to the file/folder, its size and other various information.
10
11 3 Greg Burri
h3. Case 1
12
13
* 8'531 files/folders (some mp3)
14
* Time to index : ~1s (8'000 item/s)
15
* Average size per indexed item : 464 bytes.
16
* Total size in memory : ~3.7 MB
17
* Speed to do a search : < 1 ms
18
19
h3. Case 2
20 1 Greg Burri
21 5 Greg Burri
* 309'269 files/folders (various files)
22 3 Greg Burri
* Time to index : ~30s (10'000 item/s)
23
* Average size per indexed item : 140 bytes. (The filenames from case 1 are surely longer and thus own more words).
24
* Total size in memory : ~42 MB
25
* Speed to do a search : < 1 ms
26
27
28
h2. Conclusion
29
30 4 Greg Burri
This algorithm is very time effective for searching or indexing but takes a lot of memory. For the moment it will be used unchanged but some space optimization may be done for the future.