This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, Mar 2, 2020 at 11:13 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Feb 28, 2020 at 7:35 PM Jan Beulich <jbeulich@suse.com> wrote: > > > > On 28.02.2020 09:57, Hongtao Liu wrote: > > > On Fri, Feb 28, 2020 at 4:55 PM Hongtao Liu <crazylht@gmail.com> wrote: > > >> > > >> Hi: > > >> This is subsequent patch for JCC Code Erratum. > > >> Refer to http://sourceware-org.1504.n7.nabble.com/PATCH-0-5-i386-Optimize-for-Jump-Conditional-Code-Erratum-tt598005.html#none > > >> > > >> According to intel SDM manual, not all compare flag-modifying > > >> instructions are marcro-fusible with subsequent jcc instruction. For > > >> those not, -mbranches-within-32B-boundaries need't align them as > > >> FUSED_JCC_PADDING, only jcc itself need to be aligned. Here are 2 > > >> restrictions which separate macro-fusible instruction from not > > >> > > >> Restriction 1: > > >> If TEST/AND/CMP/ADD/SUB/INC/DEC is one of the following format: > > >> > > >> cmp m, imm > > >> add m, imm > > >> sub m, imm > > >> test m, imm > > >> and m, imm > > >> inc m > > >> dec m > > >> > > >> it is unfusible with any jcc instructions. > > >> > > >> Restriction 2: > > >> /* Table 3-2. Macro-Fusible Instructions in Haswell Microarchitecture > > >> --------------------------------------------------------------------- > > >> | JCC | ADD/SUB/CMP | INC/DEC | TEST/AND | > > >> | ------ | ----------- | ------- | -------- | > > >> | Jo | N | N | Y | > > >> | Jno | N | N | Y | > > >> | Jc/Jb | Y | N | Y | > > >> | Jae/Jnb | Y | N | Y | > > >> | Je/Jz | Y | Y | Y | > > >> | Jne/Jnz | Y | Y | Y | > > >> | Jna/Jbe | Y | N | Y | > > >> | Ja/Jnbe | Y | N | Y | > > >> | Js | N | N | Y | > > >> | Jns | N | N | Y | > > >> | Jp/Jpe | N | N | Y | > > >> | Jnp/Jpo | N | N | Y | > > >> | Jl/Jnge | Y | Y | Y | > > >> | Jge/Jnl | Y | Y | Y | > > >> | Jle/Jng | Y | Y | Y | > > >> | Jg/Jnle | Y | Y | Y | > > > > Quoting a Haswell table (also in the code) for code to deal with > > a Skylake erratum is, without any further comments, not very > > helpful. > > > They also works for Skylake and Cascadelake, I'll add comments. > > >+enum mf_jcc_kind > > >+ { > > >+ mf_jcc_jo = 0, /* base opcode 0x70 */ > > >+ mf_jcc_jno, /* base opcode 0x71 */ > > >+ mf_jcc_jc, /* base opcode 0x72 */ > > >+ mf_jcc_jae, /* base opcode 0x73 */ > > >+ mf_jcc_je, /* base opcode 0x74 */ > > >+ mf_jcc_jne, /* base opcode 0x75 */ > > >+ mf_jcc_jna, /* base opcode 0x76 */ > > >+ mf_jcc_ja, /* base opcode 0x77 */ > > >+ mf_jcc_js, /* base opcode 0x78 */ > > >+ mf_jcc_jns, /* base opcode 0x79 */ > > >+ mf_jcc_jp, /* base opcode 0x7a */ > > >+ mf_jcc_jnp, /* base opcode 0x7b */ > > >+ mf_jcc_jl, /* base opcode 0x7c */ > > >+ mf_jcc_jge, /* base opcode 0x7d */ > > >+ mf_jcc_jle, /* base opcode 0x7e */ > > >+ mf_jcc_jg /* base opcode 0x7f */ > > >+ }; > > > > Did you consider making this an insn template attribute instead? > > > Well, it could be calculated by tm.baseopcode - 0x70 if insn is jcc. > So maybe it's a bit redudant to add it into insn_template? > > >+enum mf_cmp_kind > > >+ { > > >+ mf_cmp_test, /* test */ > > >+ mf_cmp_cmp, /* cmp */ > > >+ mf_cmp_and, /* and */ > > >+ mf_cmp_alu, /* add/sub */ > > >+ mf_cmp_incdec /* inc/dec */ > > >+ }; > > > > Why 5 enumerators when only three are needed (CMP going > > together with ADD/SUB and AND going together with TEST)? > > > Yes, only three are needed here, adding 5 enumerators just in case > there are other combinations use in the future. > > >@@ -8573,6 +8636,7 @@ output_insn (void) > > > offsetT insn_start_off; > > > fragS *fragP = NULL; > > > enum align_branch_kind branch = align_branch_none; > > >+ enum mf_jcc_kind mf_jcc = mf_jcc_jo; > > > > Is this initializer needed? It looks pretty arbitrary to pick > > JO here. > > > It's arbitrary, the initializer here is just to avoid uninitialized > warning when building with option Wuninitialized. > I'll add comments to explain the initializer. > > >@@ -10967,20 +11069,23 @@ i386_next_non_empty_frag (fragS *fragP) > > > /* Return the next jcc frag after BRANCH_PADDING. */ > > > > > > static fragS * > > >-i386_next_jcc_frag (fragS *fragP) > > >+i386_next_fusible_jcc_frag (fragS *cmp_fragP, fragS *pad_fragP) > > > > Besides "cmp" in the name again not really fitting all variants > > (afaict), it also looks suspicious with ... > > > > >@@ -11025,7 +11130,7 @@ i386_classify_machine_dependent_frag (fragS *fragP) > > > */ > > > cmp_fragP = i386_next_non_empty_frag (next_fragP); > > > pad_fragP = i386_next_non_empty_frag (cmp_fragP); > > >- branch_fragP = i386_next_jcc_frag (pad_fragP); > > >+ branch_fragP = i386_next_fusible_jcc_frag (next_fragP, pad_fragP); > > > > ... next_fragP, not cmp_fragP being passed here. > > > Yes, use maybe_cmp_fragP may be better. > > >--- a/gas/config/tc-i386.h > > >+++ b/gas/config/tc-i386.h > > >@@ -273,6 +273,7 @@ struct i386_tc_frag_data > > > unsigned char prefix_length; > > > unsigned char default_prefix; > > > unsigned char cmp_size; > > >+ unsigned int mf_type : 4; > > > > Even if not going with a (2-bit) insn attribute, this doesn't > > need to be wider than 3 bits, as J<cc> and JN<cc> always fall > > into the same group, and hence don't need separate tracking > > (i.e. when recording you could ignore the low opcode bit). > > > Yes, J<cc> and JN<cc> could share the same group. 3 bits is enough. > > Jan > > > > -- > BR, > Hongtao Update patch. -- BR, Hongtao
Attachment:
0001-According-to-intel-SDM-manual-not-all-compare-flag-m.patch
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |