Here is an implementation in Racket of the Pearson hash exercise from programming praxis. It's based on a paper by Peter K. Pearson from 1990 [1].
To compute the Pearson hash of a message, each byte of the message is subsequently added to the accumulated hash modulo 256 and looked up in a substitution table.
The initial hash value is set to zero.
The substitution table is a random permutation of 8-bit numbers.
It is also possible to use XOR
instead of addition modulo 256.
Here is a procedure for generating random substitution tables using Racket's shuffle
:
(define (generate-substitution-table) (define original-table (range 256)) (list->vector (shuffle original-table))) (define (print-table tbl) (define row 0) (for ([val tbl] [i (vector-length tbl)]) (printf "~a " (~a val #:width 3 #:align 'right)) (when (zero? (modulo (+ i 1) 16)) (set! row (+ row 1)) (printf "\n")))) (random-seed 0) (define substitution-table (generate-substitution-table)) (print-table substitution-table)
80 245 227 73 137 130 182 255 122 18 200 204 5 150 133 125 14 123 30 209 16 93 139 110 244 111 213 102 21 135 26 8 214 91 13 87 36 235 226 67 246 145 48 188 208 219 66 180 25 131 233 42 193 90 172 239 29 52 101 184 170 55 241 7 134 194 77 196 43 177 232 4 234 20 140 112 2 173 12 37 147 81 142 178 89 53 32 46 38 141 109 169 84 41 35 86 156 126 39 155 70 253 222 63 237 47 3 162 97 199 212 151 197 19 132 176 252 157 242 113 94 247 159 190 189 107 83 58 114 161 223 92 40 211 74 215 0 71 236 119 99 54 85 202 191 124 1 56 149 165 186 105 34 243 121 206 33 250 217 49 103 220 183 27 118 195 72 62 22 9 166 128 205 181 117 45 216 6 192 76 231 136 79 31 164 221 163 129 185 228 98 68 11 144 50 249 59 160 61 60 116 17 23 168 153 57 143 95 88 198 24 51 146 64 138 65 225 69 152 108 218 78 154 201 187 75 240 148 82 203 96 207 171 179 254 127 248 174 115 44 106 210 251 15 100 229 238 120 167 175 224 10 104 230 28 158
This algorithm will compute a 8-bit hash for any message. To generate 16-bit hashes, the same message is hashed again, but this time, 1 is added to the first byte modulo 256. The second hash byte is appended to the first one. This can be extended to compute hashes longer than one byte. Though, after 256 bytes, the hash bytes will start repeating.
(define (pearson-hash-n-bytes message tbl fn n) (define result (make-vector n)) (for ([i n]) (vector-set! result i (for/fold ([sum i]) ([byte message]) (vector-ref tbl (fn sum byte))))) result) (define (+mod256 a b) (modulo (+ a b) 256))
Let's compute the hash for some messages.
(define message #"Programming Praxis") (pearson-hash-n-bytes message substitution-table bitwise-xor 1)
#(11)
And here is computation of Pearson hashes side by side using two procedures: addition modulo 256 and XOR
.
(for ([n-bytes (range 1 11)]) (printf "~a: ~a ~a~n" (~a #:min-width 2 #:align 'right n-bytes) (~a #:min-width 40 (pearson-hash-n-bytes message substitution-table +mod256 n-bytes)) (pearson-hash-n-bytes message substitution-table bitwise-xor n-bytes)))
1: #(20) #(11) 2: #(20 101) #(11 95) 3: #(20 101 29) #(11 95 132) 4: #(20 101 29 175) #(11 95 132 159) 5: #(20 101 29 175 14) #(11 95 132 159 35) 6: #(20 101 29 175 14 75) #(11 95 132 159 35 14) 7: #(20 101 29 175 14 75 170) #(11 95 132 159 35 14 155) 8: #(20 101 29 175 14 75 170 6) #(11 95 132 159 35 14 155 219) 9: #(20 101 29 175 14 75 170 6 179) #(11 95 132 159 35 14 155 219 86) 10: #(20 101 29 175 14 75 170 6 179 67) #(11 95 132 159 35 14 155 219 86 85)
A sequence of numbers is ideal for representing a hash value, but is nicer to read in string form. Here is a procedure for converting sequences of numbers into strings of hex numbers.
(define (hash-vector->string hash-vec) (for/fold ([result ""]) ([byte hash-vec]) (~a result (~r byte #:base 16 #:min-width 2 #:pad-string "0")))) (printf "~a\n" (hash-vector->string #(1 2 3))) (printf "~a\n" (hash-vector->string (range 0 32)))
010203 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
References
[1] | Peter K. Pearson. Fast hashing of variable-length text strings. Commun. ACM, 33(6):677–680, June 1990. [ DOI | http ] |