PATCH: Use QA16WORD for punpcklxxx in x86 Intel mode disassembler
Sun Jul 29 20:56:00 GMT 2007
On Sun, Jul 29, 2007 at 08:29:14AM -0700, H.J. Lu wrote:
> On Sat, Jul 28, 2007 at 05:32:55PM -0700, Christian Ludloff wrote:
> > H.J.,
> > > > > Did you try load punpcklxxx from 8byte aligned, not 16byte aligned
> > > > > memory?
> > > >
> > > > What for?
> > > >
> > > > (That merely causes the expected GP(0), due to lack of 128-bit alignment.)
> > >
> > > That is one difference between m64 and m128.
> > Sure. And that 128-bit alignment requirement will start to disappear later
> > this year, when AMD's Barcelona introduces misaligned SSE support.
> > > That is why punpcklxxx has m128, not m64.
> > Not really.
> > PUNPCKL merely got lumped into the m128 category for convenience; however,
> > it was immediately implemented differently from the rest of SSE2 -- the P4
> > only accesses the low 64 bits, not the full 128 bits.
> > The 128-bit alignment requirement/check is but a relict of the days when a
> > processor had to split 128-bit accesses into two 64-bit chunks (and didn't
> > want to cope with further splitting to handle misalignment).
> We can't say punpcklxxx takes m64 as long as some processors only
> work with 16byte alignment. "XMMWORD PTR" isn't right either since
> only 8 bytes are loaded. I invented QA16WORD. Any comments?
Scrap it. I checked with our HW people. Some Intel processors
fetch 16byte from memory for punpcklxxx.
More information about the Binutils