This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries


On 28.02.2020 09:57, Hongtao Liu wrote:
> On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> Hi:
>>     This is subsequent patch for JCC Code Erratum.
>>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
>>
>>     According to intel SDM manual, not all compare flag-modifying
>> instructions are marcro-fusible with subsequent jcc instruction. For
>> those not, -mbranches-within-32B-boundaries need't align them as
>> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
>> restrictions which separate macro-fusible instruction from not
>>
>>     Restriction 1:
>>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
>>
>>        cmp m, imm
>>        add m, imm
>>        sub m, imm
>>        test m, imm
>>        and m, imm
>>        inc m
>>        dec m
>>
>>     it is unfusible with any jcc instructions.
>>
>>     Restriction 2:
>>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
>>     ---------------------------------------------------------------------
>>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
>>     | ------  | ----------- | ------- | -------- |
>>     |   Jo    |      N      |    N    |     Y    |
>>     |   Jno   |      N      |    N    |     Y    |
>>     |  Jc/Jb  |      Y      |    N    |     Y    |
>>     | Jae/Jnb |      Y      |    N    |     Y    |
>>     |  Je/Jz  |      Y      |    Y    |     Y    |
>>     | Jne/Jnz |      Y      |    Y    |     Y    |
>>     | Jna/Jbe |      Y      |    N    |     Y    |
>>     | Ja/Jnbe |      Y      |    N    |     Y    |
>>     |   Js    |      N      |    N    |     Y    |
>>     |   Jns   |      N      |    N    |     Y    |
>>     |  Jp/Jpe |      N      |    N    |     Y    |
>>     | Jnp/Jpo |      N      |    N    |     Y    |
>>     | Jl/Jnge |      Y      |    Y    |     Y    |
>>     | Jge/Jnl |      Y      |    Y    |     Y    |
>>     | Jle/Jng |      Y      |    Y    |     Y    |
>>     | Jg/Jnle |      Y      |    Y    |     Y    |

Quoting a Haswell table (also in the code) for code to deal with
a Skylake erratum is, without any further comments, not very
helpful.

>+enum mf_jcc_kind
>+  {
>+    mf_jcc_jo = 0,  /* base opcode 0x70  */
>+    mf_jcc_jno,     /* base opcode 0x71  */
>+    mf_jcc_jc,      /* base opcode 0x72  */
>+    mf_jcc_jae,     /* base opcode 0x73  */
>+    mf_jcc_je,      /* base opcode 0x74  */
>+    mf_jcc_jne,     /* base opcode 0x75  */
>+    mf_jcc_jna,     /* base opcode 0x76  */
>+    mf_jcc_ja,      /* base opcode 0x77  */
>+    mf_jcc_js,      /* base opcode 0x78  */
>+    mf_jcc_jns,     /* base opcode 0x79  */
>+    mf_jcc_jp,      /* base opcode 0x7a  */
>+    mf_jcc_jnp,     /* base opcode 0x7b  */
>+    mf_jcc_jl,      /* base opcode 0x7c  */
>+    mf_jcc_jge,     /* base opcode 0x7d  */
>+    mf_jcc_jle,     /* base opcode 0x7e  */
>+    mf_jcc_jg       /* base opcode 0x7f  */
>+  };

Did you consider making this an insn template attribute instead?

>+enum mf_cmp_kind
>+  {
>+    mf_cmp_test,  /* test */
>+    mf_cmp_cmp,  /* cmp */
>+    mf_cmp_and,  /* and */
>+    mf_cmp_alu,  /* add/sub */
>+    mf_cmp_incdec  /* inc/dec */
>+  };

Why 5 enumerators when only three are needed (CMP going
together with ADD/SUB and AND going together with TEST)?

>@@ -8573,6 +8636,7 @@ output_insn (void)
>   offsetT insn_start_off;
>   fragS *fragP = NULL;
>   enum align_branch_kind branch = align_branch_none;
>+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;

Is this initializer needed? It looks pretty arbitrary to pick
JO here.

>@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> /* Return the next jcc frag after BRANCH_PADDING.  */
> 
> static fragS *
>-i386_next_jcc_frag (fragS *fragP)
>+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)

Besides "cmp" in the name again not really fitting all variants
(afaict), it also looks suspicious with ...

>@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> 	       */
> 	    cmp_fragP = i386_next_non_empty_frag (next_fragP);
> 	    pad_fragP = i386_next_non_empty_frag (cmp_fragP);
>-	    branch_fragP = i386_next_jcc_frag (pad_fragP);
>+	    branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);

... next_fragP, not cmp_fragP being passed here.

>--- a/gas/config/tc-i386.h
>+++ b/gas/config/tc-i386.h
>@@ -273,6 +273,7 @@ struct i386_tc_frag_data
>   unsigned char prefix_length;
>   unsigned char default_prefix;
>   unsigned char cmp_size;
>+  unsigned int mf_type : 4;

Even if not going with a (2-bit) insn attribute, this doesn't
need to be wider than 3 bits, as J<cc> and JN<cc> always fall
into the same group, and hence don't need separate tracking
(i.e. when recording you could ignore the low opcode bit).

Jan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]