This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: unaligned memcpy and DMA


On Tue, Jan 06, 2015 at 06:44:50PM -0200, Adhemerval Zanella wrote:
> On 06-01-2015 18:35, OndÅej BÃlka wrote:
> > On Tue, Jan 06, 2015 at 05:12:01PM -0200, Adhemerval Zanella wrote:
> >> On 06-01-2015 16:53, OndÅej BÃlka wrote:
> >>> Main question is why there is no power8 memcpy using unaligned loads yet?
> >>>
> >>> Memcpy is called about hundred times more often than strcpy(and no
> >>> strncpy call) on my computer so possible gains are bigger and with 
> >>> optimized memcpy a generic strncpy will be faster as well.
> >> Mainly because powerpc still triggers kernel traps when issuing VMX/VSX instruction
> >> on non-cacheable memory. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
> >> by the way.
> >>
> >> Although it is not really an issue for 99% of cases, where memory will be cacheable;
> >> some code (specially libdrm and xorg), uses memcpy (and possible memset) on DMA mapped
> >> memory.  And that's why memcpy/memset for POWER8 are still using aligned accesses all
> >> 5b76b17a8929
> > That looks like overkill. Better way would be add variable that detects
> > if application can do it.
> >
> > A probably simplest way would be add variable in vdso that kernel sets
> > to 1 when doing trap.
> >
> > Otherwise it would be more complicated as we would need set it when
> > application allocates noncachable memory, is mmap only way to do that?
> >
> My understanding is DMA memory is allocated only through mmap plus specific flags.
> However, I don't see how a vDSO variable would help us in this case: any process
> can mmap and DMA area and it will have a mix of pages with and without cacheable
> states.
>
Could but 99% of applications don't and these will benefit from unaligned memcpy. 
 
> The correct way in my understanding would be to use an specialized memcpy on
> non-cacheable memory (either by environment flags or by using an app-specific one).
> But this is another issue.
>
Why add manual detection of something that could be checked
automatically?

Overhead of automatic detection versus manual is few cycles per call. If we add more profiling infrastructure its possible to eliminate that overhead as we read from file if application used noncachable access last time it was run and use appropriate ifunc.
 
> Anyway, I need to evaluate which kind of gain it would yield for unaligned cases.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]