In case of key collision broken implementation will likely attribute
measurements to the wrong key and therefore it is better to have
non-zero value to end up with a wrong average value.
When all measurements are zero then averages are also zero even
when attributed to the wrong keys.
Updates #91
* Added tests for endian-calculations (had these in a different class, perhaps handy for others to see as well)
Inlined the hash function, runs locally in 2.4sec now, hopefully endian issues fix
Added equals to support any city name up to 1024 in length, don't rely on hash
* For clarity I've updated the code so endian doesn't change the hashes, easier to debug.
* Fixing bug in array check
Simple is faster
* Also spotted the diff, not just the big exception
Fixed buffer limit issue
Input created via
```sh
bash -c 'for i in {1..10000} ; do echo "id$i;0.0" ; done' >./src/test/resources/samples/measurements-10000-unique-keys.txt
```
and output via baseline implementation.
Keys are short and very similar which improves chances for collision
and hence are good for testing.
Fixes#91
The script tests all implementations and prints PASS or FAIL status.
In case of failure it also prints implementation output to stderr.
This will be handy for adding new test samples.
Show test statuses and omit failing output:
```sh
$ ./test_all.sh 2>/dev/null
PASS artsiomkorzun
PASS baseline
PASS bjhara
PASS criccomini
FAIL ddimtirov
FAIL ebarlas
PASS filiphr
FAIL itaske
PASS khmarbaise
FAIL kuduwa-keshavram
FAIL lawrey
PASS padreati
FAIL palmr
PASS richardstartin
FAIL royvanrijn
FAIL seijikun
PASS spullara
PASS truelive
```
Show only passing implementations:
```
$ ./test_all.sh 2>/dev/null | grep PASS | cut -d' ' -f2
artsiomkorzun
baseline
bjhara
criccomini
filiphr
khmarbaise
padreati
richardstartin
spullara
truelive
```
For #61
* Use open-addressing scheme to deal with hash table collisions. Reduce concurrency from 16 to 8. Use bit mask rather than mod operator to confine hash code to table range.
* Properly handle file partitions that reside entirely within a line.
* Reorder statements in doProcessBuffer.
Adds test samples that can be used for unit tests or to verify
implementations via:
```bash
for sample in $(ls src/test/resources/samples/*.txt)
do
echo "Validating $sample"
rm -f measurements.txt
ln -s $sample measurements.txt
diff <(./calculate_average.sh) ${sample%.txt}.out
done
rm measurements.txt
```
For #61
Removing two solutions from the leaderboard temporarily, as they violated the "must work with all station names" rule and are prone to hash collisions for names not in the current data set. Will bring them back once these submissions have been updated.
Added SWAR (SIMD Within A Register) code to increase bytebuffer processing/throughput
Delaying the creation of the String by comparing hash, segmenting like spullara, improved EOL finding
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>