* Combine <8 and 8-16 cases into one case.
* Adopt mask-based approach for the <16 length city fast path (idea of Van Phu Do).
* Slightly improved code layout.
* Update perf number.
- use smaller regions (increased region count) so there will be less idle time for the workers who completed their tasks
- get rid of some configuration related stuff during initialization which might save a few tens of milliseconds hopefully
- update temperature value parsing instruction order to get benefit of ILP better (hopefully)
* decrease instruction level parallelism
it turns out doing 2 things was too much. perf annotate showed spilling.
* more trickery with latency hiding
* work-stealing, lookp tables, credits
* do not assume gender
* Disable The GC
Cuts off sometimes up to 1 seconds
of runtime on my machine.
* Remove Confusing Byte-Order Parameter
Bytes have no Byte-Order ;)
* Provide More Memory to Run the 10K set
* Fix Comparison Function
* Justin's implementation
* Rename justin to Judekeyser
* Back to previous implementation of vectors
* Reading names as sequences of integers
* Fixing tests
* Scale down the number of NIO workers
---------
Co-authored-by: Justin Dekeyser <justin.dekeyser@Justins-MacBook-Pro.local>
instead of writing result line by line, implemented random.choices for randomisation of multiple stations and writing large batche ot the disk, also instead of "round" just using :.1f which is probably quicker on a large scale, because it's not a mathematical function
* added code
* Fixed pointers bugs
* removed my own benchmark
* added comment on how I handle hash collisions
* executed mwvn clean verify
* made scripts executable & fixed rounding issues
* Fixed way of dealing with hash collisions
* changed method name sameNameBytes to isSameNameBytes
* changes script from sh to bash
* fixed chunking bug
* Fixed bug in chunking when file size is too small
* added Runtime.getRuntime().availableProcessors
* added improvemnts on string copying, calculation of next index of Map in case on collision & improved string comparing
* Some clean up, fine tuning, removing non-supported options, added credit
section and additional comments.
* Put license header year back to 2023 to pass checks.
* Remove static linking (as it requires some more setup on the target
machine).
- split big regions into shared smaller tasks, so the workers complete their own tasks can pick up from the remaining instead of leaving its core idle
- reduce number of executed instructions in the hot path
/**
* Solution based on thomaswue solution, commit:
* commit d0a28599c2
* Author: Thomas Wuerthinger
* Date: Sun Jan 21 20:13:48 2024 +0100
*
* The goal here was to try to improve the runtime of his 10k
* solution of: 00:04.516
*
* With Thomas latest changes, his time is probably much better
* already, and maybe even 1st place for the 10k too.
* See: https://github.com/gunnarmorling/1brc/pull/606
*
* But as I was already coding something, I'll submit just to
* see if it will be faster than his *previous* 10k time of
* 00:04.516
*
* Changes:
* It's a similar idea of my previous solution, that if you split
* the chunks evenly, some threads might finish much faster and
* stay idle, so:
* 1) Create more chunks than threads, so the ones that finish first
* can do something;
* 2) Decrease chunk sizes as we get closer to the end of the file.
*/
* CalculateAverage_pdrakatos
* Rename to be valid with rules
* CalculateAverage_pdrakatos
* Rename to be valid with rules
* Changes on scripts execution
* Fixing bugs causing scripts not to be executed
* Changes on prepare make it compatible
* Fixing passing all tests
* Increase direct memory allocation buffer
* Fixing memory problem causes heap space exception
* Fresh solution to optimize performance of the execution