This is the mail archive of the
mailing list for the libc-ports project.
Re: [patch, mips] Improved memset for MIPS
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Steve Ellcey <sellcey at mips dot com>, "Carlos O'Donell" <carlos at systemhalted dot org>
- Cc: Andrew Pinski <pinskia at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>
- Date: Thu, 12 Dec 2013 23:40:15 -0500
- Subject: Re: [patch, mips] Improved memset for MIPS
- Authentication-results: sourceware.org; auth=none
- References: <93a232b5-9d0b-4a27-bbb5-16e3ae7c4b89 at BAMAIL02 dot ba dot imgtec dot org> <Pine dot LNX dot 4 dot 64 dot 1309061430150 dot 5886 at digraph dot polyomino dot org dot uk> <1378483039 dot 5770 dot 302 dot camel at ubuntu-sellcey> <Pine dot LNX dot 4 dot 64 dot 1309061603380 dot 8532 at digraph dot polyomino dot org dot uk> <1378486241 dot 5770 dot 327 dot camel at ubuntu-sellcey> <Pine dot LNX dot 4 dot 64 dot 1309061653280 dot 8532 at digraph dot polyomino dot org dot uk> <1379526035 dot 5770 dot 414 dot camel at ubuntu-sellcey> <Pine dot LNX dot 4 dot 64 dot 1309201643100 dot 3814 at digraph dot polyomino dot org dot uk> <1379698355 dot 5770 dot 466 dot camel at ubuntu-sellcey> <CA+=Sn1=87nKm1ShivDn5dJ29dNg5zYgQ58uSfWb18+mXh3-spA at mail dot gmail dot com> <CAE2sS1iqz-GVB8hZVbZL7D4hr6Xs09ofPtaU2WQ6wzeBAjcf8w at mail dot gmail dot com> <1386893669 dot 2764 dot 30 dot camel at ubuntu-sellcey>
On 12/12/2013 07:14 PM, Steve Ellcey wrote:
> On Thu, 2013-12-12 at 19:01 -0500, Carlos O'Donell wrote:
>>> I noticed this patch causes some performance regressions on Octeon due
>>> to having 128 byte cache lines.
>>> Changing PREFETCH_CHUNK/PREFETCH_FOR_STORE to assume 128 byte cache
>>> line gives us the performance back and improves over the original code
>>> at least 15%.
>>> That is:
>>> # define PREFETCH_CHUNK 128
>>> # define PREFETCH_FOR_STORE(chunk, reg) \
>>> pref PREFETCH_STORE_HINT, (chunk)*128(reg);
>> Submit a patch for that?
>> We have microbenchmarks now, but the next hardest
>> part is going to be archiving data by device so that
>> the community can help track performance and point
>> out regressions like this.
> Unless the change is under some kind of ifdef for Octeon changing this
> will probably slow down other MIPS chips. Most of them have 32 byte
> cache lines.
Absolutely. I don't suggest he just change it, but Andrew would have to add
enough framework for Octeon to be enabled with an optimal implementation.
For example you could compile an alternate version with 128 byte cache
line support and select it via IFUNC based on AT_HWCAP?