[PATCH v2] Add __pure2 to __locale_ctype_ptr(_l)
Craig Howland
howland@LGSInnovations.com
Tue Nov 7 16:24:00 GMT 2017
On 11/07/2017 11:12 AM, Wilco Dijkstra wrote:
> Corinna Vinschen wrote:
>> Wilco Dijkstra wrote:
>>> And it works with -O2 if you split off the p++ in the increment part of the for.
>> No, it doesn't. I retried with your style of for loop, but there's
>> simply no difference for me. -O2, -O3, pure/ not-pure, with f++ split
>> off or not, it's always taking the same time on average.
> That's odd - maybe pure2 doesn't get correctly defined in your environment. I get this
> using your unchanged benchmark with -O3 - it clearly lifts the call:
>
> ldrb w19, [x20]
> add x20, x20, 1
> cbz w19, .L3
> stp x22, x23, [sp, 40]
> bl __locale_ctype_ptr
> adrp x23, .LC0
> mov x22, x0
> add x23, x23, :lo12:.LC0
> .p2align 3
> .L4:
> add x19, x22, x19, uxtb
> ldrb w0, [x19, 1]
> tbnz x0, 4, .L20
> ldrb w19, [x20], 1
> cbnz w19, .L4
>
> What is the disassembly of your version?
>
>>> No this is certainly not architecture dependent. The ctype implementation used to
>>> be fast, but it is slow now - changes made to ctype last year caused it.
>> I was talking about the above observation. The changes to the locale
>> stuff were necessary to support POSIX.1-2008 locale objects. If you
>> think the implementation has flaws, please provide patches.
> The ctype implementation certainly can be improved further. However adding
> pure2 fixes the major slowdown and has similar performance as GLIBC again,
> so that's the most important fix for now.
>
> Wilco
All of this might be moot due to pure2 appearing to not be valid for the general
__locale_ctype_ptr case. (Even if it would be safe for the specific test cases
being discussed.)Â Would you please comment on those concerns?
(https://sourceware.org/ml/newlib/2017/msg01055.html in case it did not make it
to you originally.)
Craig
More information about the Newlib
mailing list