Improve performance of recursive IO locks by adding a fast path for
the single-threaded case. To reduce the number of memory accesses for
locking/unlocking, only increment the recursion counter if the lock
is already taken.
On Neoverse V1, a microbenchmark with many small freads improved by
2.9x. Multithreaded performance improved by 2%.