- Eliminate redundant object creations in between
- Custom HashMap on purpose - Inspired by @spullara
- More performant temperature parsing - Inspired by @yemreinci
- JVM tweaks, decreased heap memory, and remove AlwaysPreTouch
Co-authored-by: Yavuz Tas <yavuz.tas@ing.com>
* Create clones
* Cleanup code and add memory mapping to read file
* Fix chunks reading logic to fit linebreak
* Remove unused
* Sequential
* Multi thread process chunks
* Add new line in output
* Remove unnecessary operation with map & reducer memory
* Reduce mem usage by using only 1 map
* formatting
* Remove unnecessary length check
* Remove trycatch
* Optimize double parsing
* check full hash before comparing
* implement merykitty suggestions to simplify temperature masking; required refactoring to little-endian
* standalone script for offline Perfect Hash seed searching
* stop using an oversized hash table
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
Prepare scripts that use `#!/bin/sh` and
`source "$HOME/.sdkman/bin/sdkman-init.sh"`
fail on systems where `sh` is not `bash`,
e.g. on Ubuntu it is `dash` which has not `source`.
* Script to Create a New Fork
* validate fork name
* use interactive prompt to prevent accidentally overwriting existing fork
* recommend create_fork.sh in the README
* remove mention of additional_builds_steps (obsolete)
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
* divide the reading of the file by parts
* fix format
* add number of core partition
* fix format
* implement strToDouble
* fix strtodouble
* add locale, fix read file, tests pass
* delete unnecessary method clean
* First Version
First draft; stole chunking but it's bad
Forgot my changes
No regex building
Clean & optim
I was not benchmarking myself T_T
Faaaster
First Version
* Update calculate_average_samuelyvon.sh
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* Add prepare script
* Fix rounding
* Fix format
* Fixing casing
* Formats of sorts?
* Rename
---------
Co-authored-by: Gunnar Morling <gunnar.morling@googlemail.com>
* calculate_average_mtopolnik
* short hash (just first 8 bytes of name)
* Remove unneeded checks
* Remove archiving classes
* 2x larger hashtable
* Add "set" to setters
* Simplify parsing temperature, remove newline search
* Reduce the size of the name slot
* Store name length and use to detect collision
* Reduce memory loads in parseTemperature
* Use short for min/max
* Extract constant for semicolon
* Fix script header
* Explicit bash shell in shebang
* Inline usage of broadcast semicolon
* Try vectorization
* Remove vectorization
* Go Unsafe
* Use SWAR temperature parsing by merykitty
* Inline some things
* Remove commented-out MemorySegment usage
* Inline namesMem.asSlice() invocation
* Try out JVM JIT flags
* Implement strcmp
* Remove unused instance variables
* Optimize hashing
* Put station name into hashtable
* Reorder method
* Remove usage of MemorySegment.getUtf8String
Replace with UNSAFE.copyMemory() and new String()
* Fix hashing bug
* Remove outdated comments
* Fix informative constants
* Use broadcastByte() more
* Improve method naming
* More hashing
* Revert more hashing
* Add commented-out code to hash 16 bytes
* Slight cleanup
* Align hashtable at cacheline boundary
* Add Graal Native image
* Revert Graal Native image
This reverts commit d916a42326d89bd1a841bbbecfae185adb8679d7.
* Simplify shell script (no SDK selection)
* Move a constant, zero out hashtable on start
* Better name comparison
* Add prepare_mtopolnik.sh
* Cleaner idiom in name comparison
* AND instead of MOD for hashtable indexing
* Improve word masking code
* Fix formatting
* Reduce memory loads
* Remove endianness checks
* Avoid hash == 0 problem
* Fix subtle bug
* MergeSort of parellel results
* Touch up perf
* Touch up perf
* Remove -Xmx256m
* Extract result printing method
* Print allocation details on OOME
* Single mmap
* Use global allocation arena
* initial commit
* - use loop
- use mutable object to store results
* get rid of regex
* Do not allocate measurement objects
* MMap + custom double parsing ~ 1:30 (down from ~ 2:05)
* HashMap for accumulation and only sort at the end - 1:05
* MMap the whole file
* Use graal
* no GC
* Store results in an array list to avoid double map lookup
* Adjust max buf size
* Manual parsing number to long
* Add --enable-preview
* remove buffer size check (has no effect on performance)
* fix min & max initialization
* do not check for \r
* Revert "do not check for \r"
This reverts commit 9da1f574bf6261ea49c353488d3b4673cad3ce6e.
* Optimise parsing. Now completes in 31 sec down from ~43
* trying to parse numbers faster
* use open address hash table instead of the standard HashMap
* formatting
* Rename the script to match github username (change underscores to slashes)
Enable transparent huge pages, seems to improve by ~2 sec
* Revert "formatting"
This reverts commit 4e90797d2a729ed7385c9000c85cc7e87d935f96.
* Revert "use open address hash table instead of the standard HashMap"
This reverts commit c784b55f61e48f548b2623e5c8958c9b283cae14.
* add prepare_roman-r-m.sh
* SWAR tricks to find semicolon (-2 seconds ro run time)
* remove time call
* fix test
* Parallel version (~6.5 seconds)
* Add multithreaded variant to generate measurements
* Add removing existing measurements.txt file in case exists (for usability reasons)
Fix bug for number of lines generated
* Fix also for less than assumed chunk size (10M entries) per thread
#### Check List:
- [x] Tests pass (`./test.sh MeanderingProgrammer` shows no differences between expected and actual outputs)
- [x] All formatting changes by the build are committed
- [x] Your launch script is named `calculate_average_MeanderingProgrammer.sh` (make sure to match casing of your GH user name) and is executable
- [x] Output matches that of `calculate_average_baseline.sh`
* Execution time: `00:04.668`
* Execution time of reference implementation: `02:40.597`
* System: Apple M2 Max, 12 cores, 64 GB