Runs with standard JDK 21.
On my computers (i5 13500, 20 cores, 32GB ram) my best run is (file fully in page cache):
49.78user 0.69system 0:02.81elapsed 1795%CPU
A bit older version of the code on Mac pro M1 32 GB:
real 0m2.867s
user 0m23.956s
sys 0m1.329s
As I wrote in comments in the code, I have a few different roundings that the reference implementation. I have seend that there is an issue about that, but no specific rule yet.
Main points:
- use MemorySegment, it's faster than ByteBuffer
- split the work in a lot of chunks and distribute to a thread pool
- fast measurement parser by using a lot of domain knowledge
- very low allocation
- visit each byte only once
Things I tried that were in fact pessimizations:
- use some internal JDK code to vectorize the hashCode computation
- use a MemorySegment to represent the keys instead of byte[], to avoid
copying
Hope I won't have a bad surprise when running on the target server 😱
* start
* slower
* still bad
* finally faster than baseline :)
* starting to go fast
* improve
* we ball
* fix race condition an newline
* change threadpool
* ~18sec on my machine
* single thread memory mapped file reader, pool of processors
* cleanup of inner classes of MetricProcessor
* doubles are parsed without external functions, strings are lazily created from byte arrays
* remove load() MappedByteBuffer in memory
* fixed handling of newline
* fix a bug & correct locale used
* MappedByteBuffer size set to 1MB
* fixed rounding
* Do not use ArrayBlockingQueue.offer since it drops elements when queue is full
* MappedByteBuffer size = 32 MB
* Adding kgeri's solution
* parallelizing CalculatorAverage_kgeri
* fixing aggregation bugs, chunk size calc for small files
* removed GC logging
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* fix for when there's no newline at end of input
* fix for when the final record ends on the chunk boundary
---------
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* improve double reading by eleminating string parsing in between, make calculations over on integer instead of double, parse into double at the end only once
* more improvements, sharing a single StringBuilder to build all toStrings, minor performance gain.
* micro optimizations on reading temperature
* a small skip for redundant traverses, micro optmization
* micro optimization, eleminate some if cases, saves 0.5 seconds more
* micro optimization, calculate key hash ahead eleminates more more loop, saves 0.5 seconds more :)
* optimize key equals and handling the case when a region is larger than max integer size
---------
Co-authored-by: Yavuz Tas <yavuz.tas@ing.com>
* Initial version with multiple ideas
* Added virtual thread implementation based on certain task size
* Removed evaluate file
* Fixed test issues
* Added a custom input split
* first try
* format
* Update calculate_average_imrafaelmerino.sh
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* Update src/main/java/dev/morling/onebrc/CalculateAverage_imrafaelmerino.java
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
---------
Co-authored-by: Rafael Merino García <imrafaelmerino@gmail.com>
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* Implementation of 1brc - felix19350
* Added license header
* Fixed failing tests
* Replaced parsing of doubles with a custom parser and integer arithmetic
---------
Co-authored-by: Bruno Felix <bruno.felix@klarna.com>
* Initial version.
* Make PGO feature optional off-by-default. Needs PGO_MODE environment
variable to be set. Add -O3 -march=native tuning flags for better
performance.
* Adjust script to be more quiet.
* Adjust max city length. Fix an issue when accumulating results.
* Tune thomaswue submission.
mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies.
These tricks give a ~30% speedup, over an already fast implementation.
* Optimize parsing of numbers based on specific given constraints.
* Fix for segment calculation for case of very small input.
* Minor shell script fixes.
* Separate out build step into file additional_build_step_thomaswue.sh,
simplify run script and remove PGO option for now.
* Minor corrections to the run script.
---------
Co-authored-by: Alfonso² Peterssen <alfonso.peterssen@oracle.com>