* Initial version.
* Make PGO feature optional off-by-default. Needs PGO_MODE environment
variable to be set. Add -O3 -march=native tuning flags for better
performance.
* Adjust script to be more quiet.
* Adjust max city length. Fix an issue when accumulating results.
* Tune thomaswue submission.
mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies.
These tricks give a ~30% speedup, over an already fast implementation.
* Optimize parsing of numbers based on specific given constraints.
* Fix for segment calculation for case of very small input.
* Minor shell script fixes.
* Separate out build step into file additional_build_step_thomaswue.sh,
simplify run script and remove PGO option for now.
* Minor corrections to the run script.
---------
Co-authored-by: Alfonso² Peterssen <alfonso.peterssen@oracle.com>
* A solution with Actor Model concurrency and MappedByteBuffer
* fix test cases
* revert back the file name to original
* cache String hashCode calculation via composing with Key object
* fix wrong key caching and eleminate duplicate String creation between actors
* update possible char count in a line, fix calculate_average.sh
* increase possible line length to 256 bytes, much safer to cover 100 chars I hope
---------
Co-authored-by: Yavuz Tas <yavuz.tas@ing.com>
* artpar's attempt
* artpar's attempt
* remove int -> Integer conversions, custom parsing for measurements
* remove allocations by caching station names
also remove Integer and use int instead to remove valueOf calls
* Fix result mismatch errors
* parse int instead of double
* reduce time spent reading the mapped buffer
* cleanup unused memory
* less is faster ? vector addition doesn't look worth it
* backout from virtual threads as well
* Fix breaking tests
Somewhat mixed collection of multiple ideas, mostly based initially
on using the new JDK Vector API for extracting offsets of newlines
and semicolons.
Runs locally in just under 11 seconds on 1B rows of input on a
2020 M1 Macbook Air.
* isolgpus: submission 1
* isolgpus: fix min value bug (breaks if a negative temperature never appears)
* isolgpus: remove unused collector
* isolgpus: fix split on chunk bug
* isolgpus: change name equality algo to a cheaper check.
* isolgpus: fix chunking state to cope with last byte of last chunk
* isolgpus: hash as we go, instead of at the end
* isolgpus: adjust thread count to core count
* isolgpus: change cores to 8 statically
---------
Co-authored-by: Jamie Stansfield <jalstansfield@gmail.com>
* First performance tweaks
* further tweaks
* collect into a treemap
* Tweak JVM options
* Inline rounding into collector
* reduce some operations
* oops, add missing braces
* tweak JVM options
* small fixes
* add min and max to processing
* fix min
* remove compact strings
* replace sumWithCompensation with naive sum implementation
* use UseShenandoahGC
* integrate mmap
* integrate mmap
* Fix messed up array logic
* Set jdk version
* Use Integer calculation instead of double, add unit-test
* Bring back StationIdent optimization
Originally, StationIdent was using byte[] to store names, so the extra
String allocation could be avoided. However, that produced incorrect
sorting.
Sorting is now moved to the result merging step. Here, names are
converted to Strings.
* Implement readStationName with SIMD 256bit
* Rebase and cleanup test code, now that it's in the project
* Fix seijikun formatting
* Fix test failure in specific jobCnt edge-cases
* Also switch to graalvm
In case of key collision broken implementation will likely attribute
measurements to the wrong key and therefore it is better to have
non-zero value to end up with a wrong average value.
When all measurements are zero then averages are also zero even
when attributed to the wrong keys.
Updates #91
* Added tests for endian-calculations (had these in a different class, perhaps handy for others to see as well)
Inlined the hash function, runs locally in 2.4sec now, hopefully endian issues fix
Added equals to support any city name up to 1024 in length, don't rely on hash
* For clarity I've updated the code so endian doesn't change the hashes, easier to debug.
* Fixing bug in array check
Simple is faster
* Also spotted the diff, not just the big exception
Fixed buffer limit issue
Input created via
```sh
bash -c 'for i in {1..10000} ; do echo "id$i;0.0" ; done' >./src/test/resources/samples/measurements-10000-unique-keys.txt
```
and output via baseline implementation.
Keys are short and very similar which improves chances for collision
and hence are good for testing.
Fixes#91
* Use open-addressing scheme to deal with hash table collisions. Reduce concurrency from 16 to 8. Use bit mask rather than mod operator to confine hash code to table range.
* Properly handle file partitions that reside entirely within a line.
* Reorder statements in doProcessBuffer.
Adds test samples that can be used for unit tests or to verify
implementations via:
```bash
for sample in $(ls src/test/resources/samples/*.txt)
do
echo "Validating $sample"
rm -f measurements.txt
ln -s $sample measurements.txt
diff <(./calculate_average.sh) ${sample%.txt}.out
done
rm measurements.txt
```
For #61
Added SWAR (SIMD Within A Register) code to increase bytebuffer processing/throughput
Delaying the creation of the String by comparing hash, segmenting like spullara, improved EOL finding
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* Initial implementation using Shenandoah GC and parallel iteration
* Use memory mapped files
* Iterate the buffer once and use BigDecimal parsing instead of Double.parseDouble
* Add information about Graal
* Add sdk use to calculate script
* Memory mapped file, single-pass parsing, custom hash map, fixed thread pool
The threading was a hasty addition and needs work
* Used arraylist instead of treemap to reduce a little overhead
We only need it sorted for output, so only construct a treemap for output
* Attempt to speed up double conversion
* Cap core count for low-core systems
* Fix wrong exponent
* Accumulate measurement value in double, seems marginally faster
Benchmark Mode Cnt Score Error Units
DoubleParsingBenchmark.ourToDouble thrpt 10 569.771 ± 7.065 ops/us
DoubleParsingBenchmark.ourToDoubleAccumulateInToDouble thrpt 10 648.026 ± 7.741 ops/us
DoubleParsingBenchmark.ourToDoubleDivideInsteadOfMultiply thrpt 10 570.412 ± 9.329 ops/us
DoubleParsingBenchmark.ourToDoubleNegative thrpt 10 512.618 ± 8.580 ops/us
DoubleParsingBenchmark.ourToDoubleNegativeAccumulateInToDouble thrpt 10 565.043 ± 18.137 ops/us
DoubleParsingBenchmark.ourToDoubleNegativeDivideInsteadOfMultiply thrpt 10 511.228 ± 13.967 ops/us
DoubleParsingBenchmark.stringToDouble thrpt 10 52.310 ± 1.351 ops/us
DoubleParsingBenchmark.stringToDoubleNegative thrpt 10 50.785 ± 1.252 ops/us