Memmove causing program crashes, giving SIGTRAP in GDB(?)
KENNON J CONRAD
kennonconrad@comcast.net
Sat Feb 28 02:30:30 GMT 2026
Hi Brian,
I just wanted to add that the stash and store idea you suggest that is also used in memmove has a very nice impact
on the assembly code.
With the old code that does this for the last 0 to 7 words:
while (candidate_ptr > score_ptr) {
*candidate_ptr = *(candidate_ptr - 1);
candidate_ptr--;
}
the assembly code shows this from the point where the move starts:
.L24:
movdqu -16(%rax), %xmm1
subq $16, %rax
movups %xmm1, 2(%rax)
cmpq %rdx, %rax
jnb .L24
movq %r10, %rax
subq %r9, %rax
subq $16, %rax
notq %rax
andq $-16, %rax
addq %r10, %rax
cmpq %rax, %r9
jnb .L28
movq %rax, %rcx
movq %rax, %rdx
movq %r9, 48(%rsp)
subq %r9, %rcx
subq $1, %rcx
shrq %rcx
leaq 2(%rcx,%rcx), %r8
negq %rcx
subq %r8, %rdx
leaq (%rax,%rcx,2), %rcx
call memmove
movq 48(%rsp), %r9
jmp .L28
But with stash and store:
*(uint64_t *)&candidates_index[new_score_rank + 1] = first_four;
*(uint64_t *)&candidates_index[new_score_rank + 5] = next_four;
the assembly code from the point where the move start is this:
.L24:
movdqu -16(%r9), %xmm1
subq $16, %r9
movups %xmm1, 2(%r9)
cmpq %rax, %r9
jnb .L24
movups %xmm0, 2(%rdi,%rdx)
jmp .L26
There are a couple of extra assembly instructions to stash into xmm0 before the move, but this is a big reduction in
assembly code size for the backward memory move. Not as fast as memmove if the DF wasn't getting corrupted, but much
better than the old code plus it completely avoids the risk of DF corruption during rep movsq in memmove for backward
move sizes >= 8! I like it because there is no need to worry about whether rep movsb or rep movsw could also be
vulnerable to DF corruption.
Best Regards,
Kennon
> On 02/27/2026 11:49 AM PST Brian Inglis via Cygwin <cygwin@cygwin.com> wrote:
>
>
> Hi Kennon,
>
> Some perf reports and analysis imply that backward moves (with overlap?) are no
> faster than straight rep movsb on some CPUs, so it may be better to just
> simplify to that, unless you want to stash the final element(s) to be moved out
> of the way in register(s), and use multiple registers in unrolled wide moves for
> the aligned portion?
>
More information about the Cygwin
mailing list