[PATCH] arm/strlen-thumb2-Os.S: Correct assembly syntax for ldrb instruction

Keith Packard keithp@keithp.com
Fri May 15 15:19:36 GMT 2020


Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:

> IIRC the .w was deliberate to keep the alignment right for some
> subsequent instructions.

I'm not sure what you mean by 'alignment' here -- would the assembler
actually insert no-ops to ensure that the instruction were aligned
somehow? I can't find any mention of this additional meaning of the '.w'
qualifier and find that unlikely as it would increase code size?

It sounds like gas and llvm have a different interpretation of the '.w'
qualifier for this instruction. The ARMv7-M Architecture Reference
Manual (ARM DDI 0403E.d (ID070218), says the '.W' means:

.W      Meaning wide, specifies that the assembler must select a 32-bit
        encoding for the instruction. If this is not possible, an
        assembler error is produced.

LDRB offers three encodings, called T1, T2 and T3 in the specs. T1 and
T2 have equivalent flexibility: loading indirect from a register with an
optional offset (with T1 being 16 bit and T2 being 32 bit). Selecting
the 32-bit T2 form extends the offset from 5 to 12 bits, so I imagine
that an environment that couldn't replace instructions at link time
might use the T2 form to make room for larger relocation values in case
the offset weren't known at assembly time?

However, only the 32-bit T3 encoding offers the post-indexed mode
required by the code here, and for that, there is no need to clarify
which to select using the .N or .W qualifiers. And, when reading the
description of the three encodings, only T2 includes the '.W' qualifier
in its syntax, although T1 doesn't include '.N', which kinda indicates
that the qualifiers should always be accepted, even if they aren't
necessary.

Hrm. For this instruction, perhaps the intent in the specification is to
use the .W to force a T2 encoding *over a T3 encoding*. The T3 encoding
also supports register-indexed mode, but with only an 8-bit offset, so
using .W might indicate that even if the offset *could* fit in the 8-bit
T3 form, that the assembler should instead select the T2 encoding. I
dunno.

> LLVM's assembler needs fixing if it doesn't accept '.w'.

Yes, it seems like that would be a good idea; having the '.w' in this
case is harmless as only the T3 encoding can possibly work. There is
already a discussion about this in the llvm world; I don't know if or
when that will result in a fix being applied.

I dug through the gas source and couldn't find any place where the '.w'
qualifier would affect the output in this case though, so removing it
should be harmless for gas.

I'm just responding to a bug report from a user trying to use
clang to build the library, and for that, fixing the source code to work
with both gas and clang in ways that don't appear to affect the output
of gas at all seemed like a reasonable option to me.

-- 
-keith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/newlib/attachments/20200515/be787bc4/attachment.sig>


More information about the Newlib mailing list