The old technique was from a time when we had to reduce stack pressure
by moving 64K buffers elsewhere. It was implemented using a static
global buffer, guarded by a muto. However, that adds a lock which may
unnecessarily serialize threads.
Use Windows heap buffers per invocation instead. HeapAlloc/HeapFree are
pretty fast, scale nicely in multithreaded scenarios and don't serialize
threads unnecessarily.