This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries


On Mon, Mar 2, 2020 at 11:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 28.02.2020 09:57, Hongtao Liu wrote:
> > > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >>
> > >> Hi:
> > >>     This is subsequent patch for JCC Code Erratum.
> > >>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> > >>
> > >>     According to intel SDM manual, not all compare flag-modifying
> > >> instructions are marcro-fusible with subsequent jcc instruction. For
> > >> those not, -mbranches-within-32B-boundaries need't align them as
> > >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> > >> restrictions which separate macro-fusible instruction from not
> > >>
> > >>     Restriction 1:
> > >>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> > >>
> > >>        cmp m, imm
> > >>        add m, imm
> > >>        sub m, imm
> > >>        test m, imm
> > >>        and m, imm
> > >>        inc m
> > >>        dec m
> > >>
> > >>     it is unfusible with any jcc instructions.
> > >>
> > >>     Restriction 2:
> > >>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> > >>     ---------------------------------------------------------------------
> > >>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
> > >>     | ------  | ----------- | ------- | -------- |
> > >>     |   Jo    |      N      |    N    |     Y    |
> > >>     |   Jno   |      N      |    N    |     Y    |
> > >>     |  Jc/Jb  |      Y      |    N    |     Y    |
> > >>     | Jae/Jnb |      Y      |    N    |     Y    |
> > >>     |  Je/Jz  |      Y      |    Y    |     Y    |
> > >>     | Jne/Jnz |      Y      |    Y    |     Y    |
> > >>     | Jna/Jbe |      Y      |    N    |     Y    |
> > >>     | Ja/Jnbe |      Y      |    N    |     Y    |
> > >>     |   Js    |      N      |    N    |     Y    |
> > >>     |   Jns   |      N      |    N    |     Y    |
> > >>     |  Jp/Jpe |      N      |    N    |     Y    |
> > >>     | Jnp/Jpo |      N      |    N    |     Y    |
> > >>     | Jl/Jnge |      Y      |    Y    |     Y    |
> > >>     | Jge/Jnl |      Y      |    Y    |     Y    |
> > >>     | Jle/Jng |      Y      |    Y    |     Y    |
> > >>     | Jg/Jnle |      Y      |    Y    |     Y    |
> >
> > Quoting a Haswell table (also in the code) for code to deal with
> > a Skylake erratum is, without any further comments, not very
> > helpful.
> >
> They also works for Skylake and Cascadelake, I'll add comments.
> > >+enum mf_jcc_kind
> > >+  {
> > >+    mf_jcc_jo = 0,  /* base opcode 0x70  */
> > >+    mf_jcc_jno,     /* base opcode 0x71  */
> > >+    mf_jcc_jc,      /* base opcode 0x72  */
> > >+    mf_jcc_jae,     /* base opcode 0x73  */
> > >+    mf_jcc_je,      /* base opcode 0x74  */
> > >+    mf_jcc_jne,     /* base opcode 0x75  */
> > >+    mf_jcc_jna,     /* base opcode 0x76  */
> > >+    mf_jcc_ja,      /* base opcode 0x77  */
> > >+    mf_jcc_js,      /* base opcode 0x78  */
> > >+    mf_jcc_jns,     /* base opcode 0x79  */
> > >+    mf_jcc_jp,      /* base opcode 0x7a  */
> > >+    mf_jcc_jnp,     /* base opcode 0x7b  */
> > >+    mf_jcc_jl,      /* base opcode 0x7c  */
> > >+    mf_jcc_jge,     /* base opcode 0x7d  */
> > >+    mf_jcc_jle,     /* base opcode 0x7e  */
> > >+    mf_jcc_jg       /* base opcode 0x7f  */
> > >+  };
> >
> > Did you consider making this an insn template attribute instead?
> >
> Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
> So maybe it's a bit redudant to add it into insn_template?
> > >+enum mf_cmp_kind
> > >+  {
> > >+    mf_cmp_test,  /* test */
> > >+    mf_cmp_cmp,  /* cmp */
> > >+    mf_cmp_and,  /* and */
> > >+    mf_cmp_alu,  /* add/sub */
> > >+    mf_cmp_incdec  /* inc/dec */
> > >+  };
> >
> > Why 5 enumerators when only three are needed (CMP going
> > together with ADD/SUB and AND going together with TEST)?
> >
> Yes, only three are needed here, adding 5 enumerators just in case
> there are other combinations use in the future.
> > >@@ -8573,6 +8636,7 @@ output_insn (void)
> > >   offsetT insn_start_off;
> > >   fragS *fragP = NULL;
> > >   enum align_branch_kind branch = align_branch_none;
> > >+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
> >
> > Is this initializer needed? It looks pretty arbitrary to pick
> > JO here.
> >
> It's arbitrary, the initializer here is just to avoid uninitialized
> warning when building with option Wuninitialized.
> I'll add comments to explain the initializer.
> > >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > > /* Return the next jcc frag after BRANCH_PADDING.  */
> > >
> > > static fragS *
> > >-i386_next_jcc_frag (fragS *fragP)
> > >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
> >
> > Besides "cmp" in the name again not really fitting all variants
> > (afaict), it also looks suspicious with ...
> >
> > >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> > >              */
> > >           cmp_fragP = i386_next_non_empty_frag (next_fragP);
> > >           pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> > >-          branch_fragP = i386_next_jcc_frag (pad_fragP);
> > >+          branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
> >
> > ... next_fragP, not cmp_fragP being passed here.
> >
> Yes, use maybe_cmp_fragP may be better.
> > >--- a/gas/config/tc-i386.h
> > >+++ b/gas/config/tc-i386.h
> > >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> > >   unsigned char prefix_length;
> > >   unsigned char default_prefix;
> > >   unsigned char cmp_size;
> > >+  unsigned int mf_type : 4;
> >
> > Even if not going with a (2-bit) insn attribute, this doesn't
> > need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> > into the same group, and hence don't need separate tracking
> > (i.e. when recording you could ignore the low opcode bit).
> >
> Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> > Jan
>
>
>
> --
> BR,
> Hongtao

Update patch.

-- 
BR,
Hongtao

Attachment: 0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]