This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries

From: Hongtao Liu <crazylht at gmail dot com>
To: Jan Beulich <jbeulich at suse dot com>
Cc: "H. J. Lu" <hjl dot tools at gmail dot com>, binutils at sourceware dot org
Date: Mon, 2 Mar 2020 11:13:59 +0800
Subject: Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
References: <CAMZc-bz8rOG6NUdnbQsb7O3igOUr+6qwFVXijF5ARiqX2HVHdw@mail.gmail.com> <CAMZc-bwwczB0x4vLRmGMCLz5HvUsNvfKvB=+=qqtt5i4iX8eoA@mail.gmail.com> <c169e304-a63e-2593-484a-236cf6a1430b@suse.com>

On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.02.2020 09:57, Hongtao Liu wrote:
> > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> Hi:
> >>     This is subsequent patch for JCC Code Erratum.
> >>     Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> >>
> >>     According to intel SDM manual, not all compare flag-modifying
> >> instructions are marcro-fusible with subsequent jcc instruction. For
> >> those not, -mbranches-within-32B-boundaries need't align them as
> >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> >> restrictions which separate macro-fusible instruction from not
> >>
> >>     Restriction 1:
> >>     If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> >>
> >>        cmp m, imm
> >>        add m, imm
> >>        sub m, imm
> >>        test m, imm
> >>        and m, imm
> >>        inc m
> >>        dec m
> >>
> >>     it is unfusible with any jcc instructions.
> >>
> >>     Restriction 2:
> >>     /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> >>     ---------------------------------------------------------------------
> >>     |   JCC   | ADD/SUB/CMP | INC/DEC | TEST/AND |
> >>     | ------  | ----------- | ------- | -------- |
> >>     |   Jo    |      N      |    N    |     Y    |
> >>     |   Jno   |      N      |    N    |     Y    |
> >>     |  Jc/Jb  |      Y      |    N    |     Y    |
> >>     | Jae/Jnb |      Y      |    N    |     Y    |
> >>     |  Je/Jz  |      Y      |    Y    |     Y    |
> >>     | Jne/Jnz |      Y      |    Y    |     Y    |
> >>     | Jna/Jbe |      Y      |    N    |     Y    |
> >>     | Ja/Jnbe |      Y      |    N    |     Y    |
> >>     |   Js    |      N      |    N    |     Y    |
> >>     |   Jns   |      N      |    N    |     Y    |
> >>     |  Jp/Jpe |      N      |    N    |     Y    |
> >>     | Jnp/Jpo |      N      |    N    |     Y    |
> >>     | Jl/Jnge |      Y      |    Y    |     Y    |
> >>     | Jge/Jnl |      Y      |    Y    |     Y    |
> >>     | Jle/Jng |      Y      |    Y    |     Y    |
> >>     | Jg/Jnle |      Y      |    Y    |     Y    |
>
> Quoting a Haswell table (also in the code) for code to deal with
> a Skylake erratum is, without any further comments, not very
> helpful.
>
They also works for Skylake and Cascadelake, I'll add comments.
> >+enum mf_jcc_kind
> >+  {
> >+    mf_jcc_jo = 0,  /* base opcode 0x70  */
> >+    mf_jcc_jno,     /* base opcode 0x71  */
> >+    mf_jcc_jc,      /* base opcode 0x72  */
> >+    mf_jcc_jae,     /* base opcode 0x73  */
> >+    mf_jcc_je,      /* base opcode 0x74  */
> >+    mf_jcc_jne,     /* base opcode 0x75  */
> >+    mf_jcc_jna,     /* base opcode 0x76  */
> >+    mf_jcc_ja,      /* base opcode 0x77  */
> >+    mf_jcc_js,      /* base opcode 0x78  */
> >+    mf_jcc_jns,     /* base opcode 0x79  */
> >+    mf_jcc_jp,      /* base opcode 0x7a  */
> >+    mf_jcc_jnp,     /* base opcode 0x7b  */
> >+    mf_jcc_jl,      /* base opcode 0x7c  */
> >+    mf_jcc_jge,     /* base opcode 0x7d  */
> >+    mf_jcc_jle,     /* base opcode 0x7e  */
> >+    mf_jcc_jg       /* base opcode 0x7f  */
> >+  };
>
> Did you consider making this an insn template attribute instead?
>
Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
So maybe it's a bit redudant to add it into insn_template?
> >+enum mf_cmp_kind
> >+  {
> >+    mf_cmp_test,  /* test */
> >+    mf_cmp_cmp,  /* cmp */
> >+    mf_cmp_and,  /* and */
> >+    mf_cmp_alu,  /* add/sub */
> >+    mf_cmp_incdec  /* inc/dec */
> >+  };
>
> Why 5 enumerators when only three are needed (CMP going
> together with ADD/SUB and AND going together with TEST)?
>
Yes, only three are needed here, adding 5 enumerators just in case
there are other combinations use in the future.
> >@@ -8573,6 +8636,7 @@ output_insn (void)
> >   offsetT insn_start_off;
> >   fragS *fragP = NULL;
> >   enum align_branch_kind branch = align_branch_none;
> >+  enum mf_jcc_kind mf_jcc = mf_jcc_jo;
>
> Is this initializer needed? It looks pretty arbitrary to pick
> JO here.
>
It's arbitrary, the initializer here is just to avoid uninitialized
warning when building with option Wuninitialized.
I'll add comments to explain the initializer.
> >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > /* Return the next jcc frag after BRANCH_PADDING.  */
> >
> > static fragS *
> >-i386_next_jcc_frag (fragS *fragP)
> >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
>
> Besides "cmp" in the name again not really fitting all variants
> (afaict), it also looks suspicious with ...
>
> >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> >              */
> >           cmp_fragP = i386_next_non_empty_frag (next_fragP);
> >           pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> >-          branch_fragP = i386_next_jcc_frag (pad_fragP);
> >+          branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
>
> ... next_fragP, not cmp_fragP being passed here.
>
Yes, use maybe_cmp_fragP may be better.
> >--- a/gas/config/tc-i386.h
> >+++ b/gas/config/tc-i386.h
> >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> >   unsigned char prefix_length;
> >   unsigned char default_prefix;
> >   unsigned char cmp_size;
> >+  unsigned int mf_type : 4;
>
> Even if not going with a (2-bit) insn attribute, this doesn't
> need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> into the same group, and hence don't need separate tracking
> (i.e. when recording you could ignore the low opcode bit).
>
Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> Jan



-- 
BR,
Hongtao

Follow-Ups:
- Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
  - From: Hongtao Liu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]