* refactor: replace xtrace with "print_and_execute" function
* nit: stylize error messages
* replace out_expected.txt with measurements_1B.out
* print
* prevent errors on cleanup
* run tests and check warmup run output before running benchmark
* move "git diff" pretty diff output to test.sh
* Ensure "set -e" is re-enabled if we followed a "continue" branch
* add timeouts to test.sh invocations
* use diff with tocsv.sh to show differences on failed test
* add --quiet mode to test.sh
* move prepare_$fork.sh invocation to right below hyperfine since test.sh also invokes it
* Revert "add --quiet mode to test.sh"
This reverts commit 13e9fb7f395c1bd64a62528b8349803bc1366941.
* use tee to capture test output to a temp file and print contents on failure
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
* reset the JDK to the default (21.0.1-open) when no prepare script is provided
* leaderboard improvements - sorting and content
* run sdk install once at the beginning of the script for all the SDKs detected in any of the evaluated prepare scripts
* remove unnecessary code and tweak doc comments
* one more nit
* Don't print rankings values when only 1 fork is being evaluated
* It's been a few hours, so I now have some more rate limit :)
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
* create new version of evaluate.sh using hyperfine + jq
* output the raw times for each command
* nit: s/command/fork/
* update evaluate2.sh for new fork file structure
* review changes
* use numactl on linux
* 1 warmup
* verify output
* leaderboard
* do not early exit on hyperfine error
* check if SMT and turbo boost are disabled
* fix bug
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>