Hi Paul, Great, that's about 14% higher throughput and 3.5% lower latency! Looking in more detail, it appears AArch64 may be the only target which inlines fmin/fmax with -O2, so the default is now off unless math_private.h overrides it. Cheers, Wilco