This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
- From: Jan Beulich <jbeulich at suse dot com>
- To: Hongtao Liu <crazylht at gmail dot com>, "H. J. Lu" <hjl dot tools at gmail dot com>
- Cc: binutils at sourceware dot org
- Date: Fri, 28 Feb 2020 12:36:01 +0100
- Subject: Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
- References: <CAMZc-bz8rOG6NUdnbQsb7O3igOUr+6qwFVXijF5ARiqX2HVHdw@mail.gmail.com> <CAMZc-bwwczB0x4vLRmGMCLz5HvUsNvfKvB=+=qqtt5i4iX8eoA@mail.gmail.com>
On 28.02.2020 09:57, Hongtao Liu wrote:
> On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> Hi:
>> This is subsequent patch for JCC Code Erratum.
>> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>>
>> According to intel SDM manual, not all compare flag-modifying
>> instructions are marcro-fusible with subsequent jcc instruction. For
>> those not, -mbranches-within-32B-boundaries need't align them as
>> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
>> restrictions which separate macro-fusible instruction from not
>>
>> Restriction 1:
>> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>>
>> cmp m, imm
>> add m, imm
>> sub m, imm
>> test m, imm
>> and m, imm
>> inc m
>> dec m
>>
>> it is unfusible with any jcc instructions.
>>
>> Restriction 2:
>> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
>> ---------------------------------------------------------------------
>> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
>> | ------ | ----------- | ------- | -------- |
>> | Jo | N | N | Y |
>> | Jno | N | N | Y |
>> | Jc/Jb | Y | N | Y |
>> | Jae/Jnb | Y | N | Y |
>> | Je/Jz | Y | Y | Y |
>> | Jne/Jnz | Y | Y | Y |
>> | Jna/Jbe | Y | N | Y |
>> | Ja/Jnbe | Y | N | Y |
>> | Js | N | N | Y |
>> | Jns | N | N | Y |
>> | Jp/Jpe | N | N | Y |
>> | Jnp/Jpo | N | N | Y |
>> | Jl/Jnge | Y | Y | Y |
>> | Jge/Jnl | Y | Y | Y |
>> | Jle/Jng | Y | Y | Y |
>> | Jg/Jnle | Y | Y | Y |
Quoting a Haswell table (also in the code) for code to deal with
a Skylake erratum is, without any further comments, not very
helpful.
>+enum mf_jcc_kind
>+ {
>+ mf_jcc_jo = 0, /* base opcode 0x70 */
>+ mf_jcc_jno, /* base opcode 0x71 */
>+ mf_jcc_jc, /* base opcode 0x72 */
>+ mf_jcc_jae, /* base opcode 0x73 */
>+ mf_jcc_je, /* base opcode 0x74 */
>+ mf_jcc_jne, /* base opcode 0x75 */
>+ mf_jcc_jna, /* base opcode 0x76 */
>+ mf_jcc_ja, /* base opcode 0x77 */
>+ mf_jcc_js, /* base opcode 0x78 */
>+ mf_jcc_jns, /* base opcode 0x79 */
>+ mf_jcc_jp, /* base opcode 0x7a */
>+ mf_jcc_jnp, /* base opcode 0x7b */
>+ mf_jcc_jl, /* base opcode 0x7c */
>+ mf_jcc_jge, /* base opcode 0x7d */
>+ mf_jcc_jle, /* base opcode 0x7e */
>+ mf_jcc_jg /* base opcode 0x7f */
>+ };
Did you consider making this an insn template attribute instead?
>+enum mf_cmp_kind
>+ {
>+ mf_cmp_test, /* test */
>+ mf_cmp_cmp, /* cmp */
>+ mf_cmp_and, /* and */
>+ mf_cmp_alu, /* add/sub */
>+ mf_cmp_incdec /* inc/dec */
>+ };
Why 5 enumerators when only three are needed (CMP going
together with ADD/SUB and AND going together with TEST)?
>@@ -8573,6 +8636,7 @@ output_insn (void)
> offsetT insn_start_off;
> fragS *fragP = NULL;
> enum align_branch_kind branch = align_branch_none;
>+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
Is this initializer needed? It looks pretty arbitrary to pick
JO here.
>@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> /* Return the next jcc frag after BRANCH_PADDING. */
>
> static fragS *
>-i386_next_jcc_frag (fragS *fragP)
>+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
Besides "cmp" in the name again not really fitting all variants
(afaict), it also looks suspicious with ...
>@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> */
> cmp_fragP = i386_next_non_empty_frag (next_fragP);
> pad_fragP = i386_next_non_empty_frag (cmp_fragP);
>- branch_fragP = i386_next_jcc_frag (pad_fragP);
>+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
... next_fragP, not cmp_fragP being passed here.
>--- a/gas/config/tc-i386.h
>+++ b/gas/config/tc-i386.h
>@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> unsigned char prefix_length;
> unsigned char default_prefix;
> unsigned char cmp_size;
>+ unsigned int mf_type : 4;
Even if not going with a (2-bit) insn attribute, this doesn't
need to be wider than 3 bits, as J<cc> and JN<cc> always fall
into the same group, and hence don't need separate tracking
(i.e. when recording you could ignore the low opcode bit).
Jan