Playground/1brc - 1brc - Gitea: Git with a cup of tea

Playground/1brc

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gunnar Morling	fa1ca65bfd	Hyperfine: Script re-org	2024-01-09 21:26:29 +01:00
Olivier Bourgain	143132e8df	My implementation is in dev.morling.onebrc.CalculateAverage_obourgain and runnable with provided script calculate_average_obourgain.sh (#75 ) Runs with standard JDK 21. On my computers (i5 13500, 20 cores, 32GB ram) my best run is (file fully in page cache): 49.78user 0.69system 0:02.81elapsed 1795%CPU A bit older version of the code on Mac pro M1 32 GB: real 0m2.867s user 0m23.956s sys 0m1.329s As I wrote in comments in the code, I have a few different roundings that the reference implementation. I have seend that there is an issue about that, but no specific rule yet. Main points: - use MemorySegment, it's faster than ByteBuffer - split the work in a lot of chunks and distribute to a thread pool - fast measurement parser by using a lot of domain knowledge - very low allocation - visit each byte only once Things I tried that were in fact pessimizations: - use some internal JDK code to vectorize the hashCode computation - use a MemorySegment to represent the keys instead of byte[], to avoid copying Hope I won't have a bad surprise when running on the target server 😱	2024-01-07 20:15:53 +01:00

Author

SHA1

Message

Date

Gunnar Morling

fa1ca65bfd

Hyperfine: Script re-org

2024-01-09 21:26:29 +01:00

Olivier Bourgain

143132e8df

My implementation is in dev.morling.onebrc.CalculateAverage_obourgain and runnable with provided script calculate_average_obourgain.sh (#75 )

Runs with standard JDK 21.

On my computers (i5 13500, 20 cores, 32GB ram) my best run is (file fully in page cache):
49.78user 0.69system 0:02.81elapsed 1795%CPU

A bit older version of the code on Mac pro  M1 32 GB:
real	0m2.867s
user	0m23.956s
sys	0m1.329s

As I wrote in comments in the code, I have a few different roundings that the reference implementation. I have seend that there is an issue about that, but no specific rule yet.

Main points:
- use MemorySegment, it's faster than ByteBuffer
- split the work in a lot of chunks and distribute to a thread pool
- fast measurement parser by using a lot of domain knowledge
- very low allocation
- visit each byte only once

Things I tried that were in fact pessimizations:
- use some internal JDK code to vectorize the hashCode computation
- use a MemorySegment to represent the keys instead of byte[], to avoid
  copying

Hope I won't have a bad surprise when running on the target server 😱

2024-01-07 20:15:53 +01:00

2 Commits