on automatic closing of ByteBuffers.. previously, a straggler could hold
up closing the ByteBuffers.
Also
- Improve Tracing code
- Parametrize additional options to aid in tuning
Our previous PR was surprising; parallelizing munmap() call did not
yield anywhere near the performance gain I expected. Local machine had
10% gain while testing machine only showed 2% gain. I am still not clear
why it happened and the two best theories I have are
1) Variance due to stragglers (that this change addresses)
2) munmap() is either too fast or too slow relative to the other
instructions compared to our local machine. I don't know which. We'll
have to use adaptive tuning, but that's in a different change.