Bloom filter
2008 September 27
This is my implementation of the Bloom filter in Scheme. This code will work with MzScheme and Spark.
Any data that has a string representation can be used as keys. Another salient feature is the ease of defining hash functions. The user can represent hash functions as pairs of integers. An internal factory procedure will convert these into actual hash functions. The user is free to provide as many hash functions as he likes. Given below is the full source of the library:
;; Bloom filter structure and public procedures. (define-struct bloom-filter-s (bitmap hfuncs)) (define (make-bloom-filter hash-functions) (make-bloom-filter-s (make-hash-table) hash-functions)) (define (bloom-filter-add! self key) (let ((hashes (make-hashes key (bloom-filter-s-hfuncs self))) (bitmap (bloom-filter-s-bitmap self))) (let loop () (if (not (null? hashes)) (begin (hash-table-put! bitmap (car hashes) 1) (set! hashes (cdr hashes)) (loop)))))) (define (bloom-filter-contains? self key) (let ((hashes (make-hashes key (bloom-filter-s-hfuncs self))) (bitmap (bloom-filter-s-bitmap self)) (contains #f)) (let loop () (if (and (not (null? hashes)) (not contains)) (begin (if (= (hash-table-get bitmap (car hashes) 0) 1) (set! contains #t)) (set! hashes (cdr hashes)) (loop)))) contains)) (define (make-hash-functions seeds-salts) (let ((funcs (list))) (let loop () (if (not (null? seeds-salts)) (begin (set! funcs (append funcs (list (make-hash-function (car (car seeds-salts)) (car (cdr (car seeds-salts))))))) (set! seeds-salts (cdr seeds-salts)) (loop)))) funcs)) ;; :~ ;; Helpers (define (make-hash-function seed salt) (let ((hash seed) (func null)) (set! func (lambda (s) (if (null? s) hash (begin (if (string? s) (begin (set! hash seed) (set! s (string->list s)))) (set! hash (+ (* salt (char->integer (car s))) (func (cdr s)))) hash)))) func)) (define (make-hashes key hash-funcs) (let ((hashes (list))) (let loop () (if (not (null? hash-funcs)) (begin (set! hashes (append hashes (list ((car hash-funcs) key)))) (set! hash-funcs (cdr hash-funcs)) (loop)))) hashes)) ;; :~
Here is a sample iteraction with a Bloom filter object:
;; To create a new bloom-filter, pass in a list of hash-functions. ;; Each hash-function is represented by a pair of it's ;; seed and salt values. (define bloom (make-bloom-filter (make-hash-functions (list '(7 31) '(3 41) '(1 51))))) ;; Our bloom filter works with strings. ;; Here, we map a few strings to the bloom-filter. (bloom-filter-add! bloom "Alonzo Church") (bloom-filter-add! bloom "Fred Brooks") (bloom-filter-add! bloom "Alan Kay") (bloom-filter-add! bloom "Dennis Ritchie") (bloom-filter-add! bloom "Donald Knuth") ;; We search if a string is a member of the current set. (printf "~a~n" (bloom-filter-contains? bloom "Fred Brooks")) ;; #t (printf "~a~n" (bloom-filter-contains? bloom "nemo")) ;; #f (printf "~a~n" (bloom-filter-contains? bloom "Alan Kay")) ;; #t (printf "~a~n" (bloom-filter-contains? bloom "alan kay")) ;; #f