Project

General

Profile

Word index prototype » History » Revision 5

Revision 4 (Greg Burri, 09/10/2009 10:54 AM) → Revision 5/6 (Greg Burri, 09/10/2009 10:56 AM)

h1. Word index prototype 

 see here : [[Algorithms#Word-indexing]] 

 h2. Measure 

 Each file and folder is split in words and indexed by these words. 

 h3. Case 1 

 * 8'531 files/folders (some mp3) 
 * Time to index : ~1s (8'000 item/s) 
 * Average size per indexed item : 464 bytes. 
 * Total size in memory : ~3.7 MB 
 * Speed to do a search : < 1 ms 

 h3. Case 2 

 * 309'269 files/folders (various files) 
 * Time to index : ~30s (10'000 item/s) 
 * Average size per indexed item : 140 bytes. (The filenames from case 1 are surely longer and thus own more words). 
 * Total size in memory : ~42 MB 
 * Speed to do a search : < 1 ms 


 h2. Conclusion 

 This algorithm is very time effective for searching or indexing but takes a lot of memory. For the moment it will be used unchanged but some space optimization may be done for the future.