This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
- From: Hongtao Liu <crazylht at gmail dot com>
- To: Jan Beulich <jbeulich at suse dot com>
- Cc: "H. J. Lu" <hjl dot tools at gmail dot com>, binutils at sourceware dot org
- Date: Mon, 2 Mar 2020 11:13:59 +0800
- Subject: Re: [PATCH] Separate macro-fusible instructions from not for -mbranches-within-32B-boundaries
- References: <CAMZc-bz8rOG6NUdnbQsb7O3igOUr+6qwFVXijF5ARiqX2HVHdw@mail.gmail.com> <CAMZc-bwwczB0x4vLRmGMCLz5HvUsNvfKvB=+=qqtt5i4iX8eoA@mail.gmail.com> <c169e304-a63e-2593-484a-236cf6a1430b@suse.com>
On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.02.2020 09:57, Hongtao Liu wrote:
> > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> Hi:
> >> This is subsequent patch for JCC Code Erratum.
> >> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none
> >>
> >> According to intel SDM manual, not all compare flag-modifying
> >> instructions are marcro-fusible with subsequent jcc instruction. For
> >> those not, -mbranches-within-32B-boundaries need't align them as
> >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2
> >> restrictions which separate macro-fusible instruction from not
> >>
> >> Restriction 1:
> >> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format:
> >>
> >> cmp m, imm
> >> add m, imm
> >> sub m, imm
> >> test m, imm
> >> and m, imm
> >> inc m
> >> dec m
> >>
> >> it is unfusible with any jcc instructions.
> >>
> >> Restriction 2:
> >> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture
> >> ---------------------------------------------------------------------
> >> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND |
> >> | ------ | ----------- | ------- | -------- |
> >> | Jo | N | N | Y |
> >> | Jno | N | N | Y |
> >> | Jc/Jb | Y | N | Y |
> >> | Jae/Jnb | Y | N | Y |
> >> | Je/Jz | Y | Y | Y |
> >> | Jne/Jnz | Y | Y | Y |
> >> | Jna/Jbe | Y | N | Y |
> >> | Ja/Jnbe | Y | N | Y |
> >> | Js | N | N | Y |
> >> | Jns | N | N | Y |
> >> | Jp/Jpe | N | N | Y |
> >> | Jnp/Jpo | N | N | Y |
> >> | Jl/Jnge | Y | Y | Y |
> >> | Jge/Jnl | Y | Y | Y |
> >> | Jle/Jng | Y | Y | Y |
> >> | Jg/Jnle | Y | Y | Y |
>
> Quoting a Haswell table (also in the code) for code to deal with
> a Skylake erratum is, without any further comments, not very
> helpful.
>
They also works for Skylake and Cascadelake, I'll add comments.
> >+enum mf_jcc_kind
> >+ {
> >+ mf_jcc_jo = 0, /* base opcode 0x70 */
> >+ mf_jcc_jno, /* base opcode 0x71 */
> >+ mf_jcc_jc, /* base opcode 0x72 */
> >+ mf_jcc_jae, /* base opcode 0x73 */
> >+ mf_jcc_je, /* base opcode 0x74 */
> >+ mf_jcc_jne, /* base opcode 0x75 */
> >+ mf_jcc_jna, /* base opcode 0x76 */
> >+ mf_jcc_ja, /* base opcode 0x77 */
> >+ mf_jcc_js, /* base opcode 0x78 */
> >+ mf_jcc_jns, /* base opcode 0x79 */
> >+ mf_jcc_jp, /* base opcode 0x7a */
> >+ mf_jcc_jnp, /* base opcode 0x7b */
> >+ mf_jcc_jl, /* base opcode 0x7c */
> >+ mf_jcc_jge, /* base opcode 0x7d */
> >+ mf_jcc_jle, /* base opcode 0x7e */
> >+ mf_jcc_jg /* base opcode 0x7f */
> >+ };
>
> Did you consider making this an insn template attribute instead?
>
Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc.
So maybe it's a bit redudant to add it into insn_template?
> >+enum mf_cmp_kind
> >+ {
> >+ mf_cmp_test, /* test */
> >+ mf_cmp_cmp, /* cmp */
> >+ mf_cmp_and, /* and */
> >+ mf_cmp_alu, /* add/sub */
> >+ mf_cmp_incdec /* inc/dec */
> >+ };
>
> Why 5 enumerators when only three are needed (CMP going
> together with ADD/SUB and AND going together with TEST)?
>
Yes, only three are needed here, adding 5 enumerators just in case
there are other combinations use in the future.
> >@@ -8573,6 +8636,7 @@ output_insn (void)
> > offsetT insn_start_off;
> > fragS *fragP = NULL;
> > enum align_branch_kind branch = align_branch_none;
> >+ enum mf_jcc_kind mf_jcc = mf_jcc_jo;
>
> Is this initializer needed? It looks pretty arbitrary to pick
> JO here.
>
It's arbitrary, the initializer here is just to avoid uninitialized
warning when building with option Wuninitialized.
I'll add comments to explain the initializer.
> >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP)
> > /* Return the next jcc frag after BRANCH_PADDING. */
> >
> > static fragS *
> >-i386_next_jcc_frag (fragS *fragP)
> >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP)
>
> Besides "cmp" in the name again not really fitting all variants
> (afaict), it also looks suspicious with ...
>
> >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP)
> > */
> > cmp_fragP = i386_next_non_empty_frag (next_fragP);
> > pad_fragP = i386_next_non_empty_frag (cmp_fragP);
> >- branch_fragP = i386_next_jcc_frag (pad_fragP);
> >+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP);
>
> ... next_fragP, not cmp_fragP being passed here.
>
Yes, use maybe_cmp_fragP may be better.
> >--- a/gas/config/tc-i386.h
> >+++ b/gas/config/tc-i386.h
> >@@ -273,6 +273,7 @@ struct i386_tc_frag_data
> > unsigned char prefix_length;
> > unsigned char default_prefix;
> > unsigned char cmp_size;
> >+ unsigned int mf_type : 4;
>
> Even if not going with a (2-bit) insn attribute, this doesn't
> need to be wider than 3 bits, as J<cc> and JN<cc> always fall
> into the same group, and hence don't need separate tracking
> (i.e. when recording you could ignore the low opcode bit).
>
Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough.
> Jan
--
BR,
Hongtao