[PATCH 1/2] i386: Generate lfence with load/indirect branch/ret [CVE-2020-0551]

Hongtao Liu crazylht@gmail.com
Mon Apr 20 07:20:52 GMT 2020


On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.04.2020 07:34, Hongtao Liu wrote:
> > I tried to re-arranged to use a common pattern (memory operand is
> > destination) and only exclude those which don't also read this
> > operand. But it turn out there still a lot of such instructions
> > include all mov instruction, store instruction for i387 and cet,
> > extract instructions, vgather instructions, vscatter instrcutions,
> > convert instrcutions and so on, so i didn't re-arrange them.
> > Other requests are done by the updated patch, also plus handling REP
> > CMP/SCAS specially since they would set EFLAGS which affects control
> > flow behavior.
> >
> >   1. No load for INVPCID, Implict load for POPS/POPF/POPA/XLATB
>
> Why INVPCID? Whether it accesses its memory operand depends on
> the value in the register operand. And what's POPS?
>

Changed for INVPCID, POPS means POP for segment registers, i'll change
it to avoid misunderstanding.

> >   2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to
> >   ret's.
> >   3. Ajust -mlfence-after-load=[yes/no] to
> >   -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
> >   equal original -mlfence-after-load=[no/yes],
>
> While there wasn't any official release with the prior option forms
> yet, I'm not sure it is a good idea to disallow the old forms
> altogether now; they may need deprecating but still permitting
> instead.
>

I prefer to change it before next release.

> >   -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
> >   since they would affect control flow behavior.
> >   -mlfence-after-load=all will issue an warning when adding lfence
> >   after REP CMPS/SCAS.
>
> I also think the various independent behavioral changes here would
> better be split into separate patches (e.g. at least one patch per
> numbered item in your enumeration above).
>

REP CMPS/SCAS is special cases for -mlfence-after-load, maybe better
in same thread.

> >   4. Adjust testcases and documents.
> >
> > gas/Changelog:
> >         * config/tc-i386.c (lfence_after_load_kine): New.
> >         (lfence_before_ret_shl): Change from lfence_before_ret_not.
> >         (load_insn_p): No load for INVPCID, implict load for
> >         POPS/POPA/POPF/XLATB.
> >         (insert_after_load): Insert lfence under
> >         -mlfence-after-load=[general|all],issue an warning when encounter
> >         REP CMPS/SCAS.
> >         (insert_before_before): Replace -mlfence-before-ret=not to
> >         -mlfence-before-ret=shl.
> >         (md_parse_option): Adjust -mlfence-after-load=[yes|no] to
> >         -mlfence-after-load=[none|general|all], Replace
> >         -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable
> >         -mlfence-before-ret=shl when
> >         -mlfence-beofre-indirect-branch=all.
> >         (md_show_usage): Ditto.
> >         * doc/c-i386.texi: Ditto.
> >         * testsuite/gas/i386/i386.exp: Add new testcases.
> >         * gas/testsuite/gas/i386/lfence-load-b.d: New.
> >         * gas/testsuite/gas/i386/lfence-load-b.e: New.
> >         * gas/testsuite/gas/i386/lfence-load.d: Modified.
> >         * gas/testsuite/gas/i386/lfence-load.e: New.
> >         * gas/testsuite/gas/i386/lfence-load.s: Modified.
> >         * gas/testsuite/gas/i386/lfence-ret-a.d: Modified.
> >         * gas/testsuite/gas/i386/lfence-ret-b.d: Modified.
> >         * gas/testsuite/gas/i386/lfence-ret-c.d: New.
> >         * gas/testsuite/gas/i386/lfence-ret-d.d: New.
> >         * gas/testsuite/gas/i386/lfence-ret.s: Modified
> >         * gas/testsuite/gas/i386/x86-64-lfence-load-b.d: New.
> >         * gas/testsuite/gas/i386/x86-64-lfence-load.d: Modified.
> >         * gas/testsuite/gas/i386/x86-64-lfence-load.s: Modified.
> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-d.d: New.
>
> There's a stray leading gas/ on the last so many lines above.
>

Changed.


> Also could you please send patches inline, unless they're too
> big to be permitted by list restrictions? Commenting on an
> attachment is quite a bit more cumbersome. Anyway, I'll try to.
>
> >-/* 1 if lfence should be inserted after every load.  */
> >-static int lfence_after_load = 0;
> >+/* Non-zero if lfence shoulde be inserted after load.  */
>
> Please try to avoid breaking correct spelling ("should"). I
> also think the comment should briefly explain the difference
> between lfence_load_general and lfence_load_all, even if
> this may seem redundant with the command line option doc.
>

Changed.

> >@@ -4350,21 +4357,28 @@ load_insn_p (void)
> >
> >   if (!any_vex_p)
> >     {
> >-      /* lea  */
> >-      if (i.tm.base_opcode == 0x8d)
> >+      /* Note: invlpg, invpcid, clflush, clflushopt, prefetchh, prefetchw
> >+       could be excluded by the later pattern.  */
> >+      /* lea, invpcid.  */
> >+      if (i.tm.base_opcode == 0x8d
> >+        || i.tm.base_opcode == 0xf3882)
>
> The first comment mentions INVPCID, but the second does, too,
> which is not logical.

Changed

>Also what about CLDEMOTE or CLWB, just
> to name a few examples not listed? Instead of relying on
> later patterns, could you perhaps bail for all AnySize insns
> here?
>

Changed.

> >-      /* pop  */
> >-      if ((i.tm.base_opcode & ~7) == 0x58
> >-        || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
> >+      /* pop, popf, popa.   */
> >+      if (strcmp (i.tm.name, "pop") == 0
> >+        || i.tm.base_opcode == 0x9d
> >+        || i.tm.base_opcode == 0x61)
>
> Personally I'd recommend against string matching, and even
> more so against a mixture of it and opcode matching. But I'm
> not the maintainer of this code.
>
> >-      /* outs */
> >-      if (base_opcode == 0x6f)
> >+      /* NB: For AMD-specific insns with implicit memory operands,
> >+       they're intentionally not covered.
> >+       outs, xlatb.  */
> >+      if (base_opcode == 0x6f
> >+        || i.tm.base_opcode == 0xD7)
> >       return 1;
>
> I'd like to request consistency in choice of case in numeric
> (hex) constant. I'd also think the AMD part of the comment
> would better go after this if()+return.
>

Changed.

> While RET/LRET get handled specially anyway, what about e.g.
> IRET which also loads data from memory?
>

Adding IRET.

> >@@ -4506,6 +4520,22 @@ insert_lfence_after (void)
> > {
> >   if (lfence_after_load && load_insn_p ())
> >     {
> >+      /* Insert lfence after rep cmps/scas only under
> >+       -mlfence-after-load=all.  */
> >+      if (((i.tm.base_opcode | 0x1) == 0xa7
> >+         || (i.tm.base_opcode | 0x1) == 0xaf)
> >+        && i.prefix[REP_PREFIX])
>
> I'm afraid I don't understand why the REP forms need treating
> differently from the non-REP ones of the same insns.
>

Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

> >+      {
> >+        if (lfence_after_load == lfence_load_general)
> >+          {
> >+            as_warn (_("`%s` skips -mlfence-after-general=general"),
>
> Mis-spelled option name?
>

Sor, changed.

> >@@ -4583,33 +4613,47 @@ insert_lfence_before (void)
> >                        last_insn.name, i.tm.name);
> >         return;
> >       }
> >-      if (lfence_before_ret == lfence_before_ret_or)
> >-      {
> >-        /* orl: 0x830c2400.  */
> >-        p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
> >-        if (flag_code == CODE_64BIT)
> >-          *p++ = 0x48;
> >-        *p++ = 0x83;
> >-        *p++ = 0xc;
> >-        *p++ = 0x24;
> >-        *p++ = 0x0;
> >-      }
> >-      else
> >+
> >+      char prefix = i.prefix[DATA_PREFIX] ? 0x66
> >+      : flag_code == CODE_64BIT ? 0x48 : 0x0;
>
> Is this correct when the RET _also_ has an explicitly specified
> REX.W prefix? Also indentation looks somewhat odd on the last
> line of this block.
>

I think yes.

> >+
> >+      if (lfence_before_ret == lfence_before_ret_not)
> >       {
> >-        p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
> >         /* notl: 0xf71424.  */
>
> Comments like this one are no longer precise: The l suffix is
> generally wrong for 64-bit code, and would also be wrong if
> there was an operand size override on the RET.
>

Yes, add comments for prefix rewrite.

> >     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
> >       if (strcasecmp (arg, "all") == 0)
> >-      lfence_before_indirect_branch = lfence_branch_all;
> >+      {
> >+        lfence_before_indirect_branch = lfence_branch_all;
> >+        lfence_before_ret = lfence_before_ret_shl;
> >+      }
>
> I don't think this should override an earlier explicit
> -mlfence-before-ret= (i.e. in particular the order the two
> options would be specified in should imo not matter).
>

Changed.

> >@@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)
> >       lfence_before_ret = lfence_before_ret_or;
> >       else if (strcasecmp (arg, "not") == 0)
> >       lfence_before_ret = lfence_before_ret_not;
> >+      else if (strcasecmp (arg, "shl") == 0)
> >+      lfence_before_ret = lfence_before_ret_shl;
> >       else if (strcasecmp (arg, "none") == 0)
> >       lfence_before_ret = lfence_before_ret_none;
> >       else
>
> With the SHL variant being truly benign (except for the
> performance impact of course), would it make sense to also
> allow for a simple "=yes" form now?

Do you means add -mlfence-before-ret=yes which indicates
-mlfence-before-ret=shl?
>
> Jan

Update my patch:

>From 9038b3e2689019bb41351c1a6f426e3d0926c651 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB and Anysize insns
  2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to
  ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_after_load_kind): New.
        (lfence_before_ret_shl): Change from lfence_before_ret_not.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB and
        Anysize insns.
        (insert_after_load): Insert lfence under
        -mlfence-after-load=[general|all],issue an warning when encounter
        REP CMPS/SCAS.
        (insert_before_before): Replace -mlfence-before-ret=not to
        -mlfence-before-ret=shl.
        (md_parse_option): Adjust -mlfence-after-load=[yes|no] to
        -mlfence-after-load=[none|general|all], Replace
        -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable
        -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New.
---
 gas/config/tc-i386.c                          | 138 +++++++++++++-----
 gas/doc/c-i386.texi                           |  30 ++--
 gas/testsuite/gas/i386/i386.exp               |   6 +
 gas/testsuite/gas/i386/lfence-load-b.d        | 137 +++++++++++++++++
 gas/testsuite/gas/i386/lfence-load-b.e        |   3 +
 gas/testsuite/gas/i386/lfence-load.d          |  30 +++-
 gas/testsuite/gas/i386/lfence-load.e          |   3 +
 gas/testsuite/gas/i386/lfence-load.s          |  20 +++
 gas/testsuite/gas/i386/lfence-ret-a.d         |   6 +
 gas/testsuite/gas/i386/lfence-ret-b.d         |   8 +
 gas/testsuite/gas/i386/lfence-ret-c.d         |  23 +++
 gas/testsuite/gas/i386/lfence-ret-d.d         |  24 +++
 gas/testsuite/gas/i386/lfence-ret.s           |   2 +
 gas/testsuite/gas/i386/x86-64-lfence-load-b.d | 137 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |  28 +++-
 gas/testsuite/gas/i386/x86-64-lfence-load.s   |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d  |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d  |   8 +
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d  |  23 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d  |  24 +++
 20 files changed, 619 insertions(+), 56 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.e
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..5243569362 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,8 +629,17 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;

-/* 1 if lfence should be inserted after every load.  */
-static int lfence_after_load = 0;
+/* Non-zero if lfence should be inserted after load.
+   lfence_load_all will generate lfence for all load instructions,
+   lfence_load_general will generate lfence for all
+   load instruction except REP CMPS/SCAS.  */
+static enum lfence_after_load_kind
+  {
+   lfence_load_none = 0,
+   lfence_load_general,
+   lfence_load_all
+  }
+lfence_after_load;

 /* Non-zero if lfence should be inserted before indirect branch.  */
 static enum lfence_before_indirect_branch_kind
@@ -647,7 +656,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4360,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
         return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+          || i.tm.base_opcode == 0x9d
+          || i.tm.base_opcode == 0x61)
         return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
         return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+          || i.tm.base_opcode == 0xd7)
         return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+         they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4522,22 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* Insert lfence after rep cmps/scas only under
+         -mlfence-after-load=all.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+           || (i.tm.base_opcode | 0x1) == 0xaf)
+          && i.prefix[REP_PREFIX])
+        {
+          if (lfence_after_load == lfence_load_general)
+            {
+              as_warn (_("`%s` skips -mlfence-after-load=general"),
+                       i.tm.name);
+              return;
+            }
+          else
+            as_warn (_("`%s` changes flags which would affect control
flow behavior"),
+                     i.tm.name);
+        }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4536,8 +4568,8 @@ insert_lfence_before (void)

       if (i.reg_operands == 1)
         {
-          /* Indirect branch via register.  Don't insert lfence with
-             -mlfence-after-load=yes.  */
+          /* Indirect branch via register. Insert lfence when
+             -mlfence-after-load=none.  */
           if (lfence_after_load
               || lfence_before_indirect_branch == lfence_branch_memory)
             return;
@@ -4568,12 +4600,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
           || i.tm.base_opcode == 0xc3
           || i.tm.base_opcode == 0xca
-          || i.tm.base_opcode == 0xcb))
+          || i.tm.base_opcode == 0xcb
+          || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
           && last_insn.seg == now_seg)
@@ -4583,33 +4616,50 @@ insert_lfence_before (void)
                          last_insn.name, i.tm.name);
           return;
         }
-      if (lfence_before_ret == lfence_before_ret_or)
-        {
-          /* orl: 0x830c2400.  */
-          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
-          *p++ = 0x83;
-          *p++ = 0xc;
-          *p++ = 0x24;
-          *p++ = 0x0;
-        }
-      else
+
+      char prefix = i.prefix[DATA_PREFIX]
+        ? 0x66 : flag_code == CODE_64BIT ? 0x48 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
         {
-          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          /* notl: 0xf71424, may add prefix
+             for operand size overwrite or 64-bit code.  */
+          p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
         }
+      else
+        {
+          p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+          if (prefix)
+            *p++ = prefix;
+          if (lfence_before_ret == lfence_before_ret_or)
+            {
+              /* orl: 0x830c2400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0x83;
+              *p++ = 0x0c;
+            }
+          else
+            {
+              /* shl: 0xc1242400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0xc1;
+              *p++ = 0x24;
+            }
+
+          *p++ = 0x24;
+          *p++ = 0x0;
+        }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12985,17 +13035,23 @@ md_parse_option (int c, const char *arg)
       break;

     case OPTION_MLFENCE_AFTER_LOAD:
-      if (strcasecmp (arg, "yes") == 0)
-        lfence_after_load = 1;
-      else if (strcasecmp (arg, "no") == 0)
-        lfence_after_load = 0;
+      if (strcasecmp (arg, "general") == 0)
+        lfence_after_load = lfence_load_general;
+      else if (strcasecmp (arg, "all") == 0)
+        lfence_after_load = lfence_load_all;
+      else if (strcasecmp (arg, "none") == 0)
+        lfence_after_load = lfence_load_none;
       else
         as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
       break;

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-        lfence_before_indirect_branch = lfence_branch_all;
+        {
+          lfence_before_indirect_branch = lfence_branch_all;
+          if (lfence_before_ret == lfence_before_ret_none)
+            lfence_before_ret = lfence_before_ret_shl;
+        }
       else if (strcasecmp (arg, "memory") == 0)
         lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13068,8 @@ md_parse_option (int c, const char *arg)
         lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
         lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0)
+        lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
         lfence_before_ret = lfence_before_ret_none;
       else
@@ -13376,13 +13434,13 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
-  -mlfence-after-load=[no|yes] (default: no)\n\
+  -mlfence-after-load=[none|general|all] (default: none)\n\
                           generate lfence after load\n"));
   fprintf (stream, _("\
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..b8192ff3ea 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -470,12 +470,15 @@ The default doesn't align branches.

 @cindex @samp{-mlfence-after-load=} option, i386
 @cindex @samp{-mlfence-after-load=} option, x86-64
-@item -mlfence-after-load=@var{no}
-@itemx -mlfence-after-load=@var{yes}
+@item -mlfence-after-load=@var{none}
+@item -mlfence-after-load=@var{general}
+@itemx -mlfence-after-load=@var{all}
 These options control whether the assembler should generate lfence
-after load instructions.  @option{-mlfence-after-load=@var{yes}} will
-generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
-lfence, which is the default.
+after load instructions.  @option{-mlfence-after-load=@var{all}} will
+generate lfence for all load instructions,
+@option{-mlfence-after-load=@var{general}}will generate lfence for all
+load instruction except rep cmps/scas, @option{-mlfence-after-load=@var{none}}
+will not generate lfence, which is the default.

 @cindex @samp{-mlfence-before-indirect-branch=} option, i386
 @cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
@@ -488,28 +491,31 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
 warning before indirect near branch via memory.
 @option{-mlfence-before-indirect-branch=@var{none}} will not generate
-lfence nor issue warning, which is the default.  Note that lfence won't
-be generated before indirect near branch via register with
-@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+lfence nor issue warning, which is the default.  Note that lfence will
+generate before indirect near branch via register only with
+@option{-mlfence-after-load=@var{none}} since lfence will be generated
 after loading branch target register.

 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..a2bdb569b7 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -530,11 +530,14 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-8"
     run_dump_test "align-branch-9"
     run_dump_test "lfence-load"
+    run_dump_test "lfence-load-b"
     run_dump_test "lfence-indbr-a"
     run_dump_test "lfence-indbr-b"
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1117,11 +1120,14 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-8"
     run_dump_test "x86-64-align-branch-9"
     run_dump_test "x86-64-lfence-load"
+    run_dump_test "x86-64-lfence-load-b"
     run_dump_test "x86-64-lfence-indbr-a"
     run_dump_test "x86-64-lfence-indbr-b"
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load-b.d
b/gas/testsuite/gas/i386/lfence-load-b.d
new file mode 100644
index 0000000000..b4f7bc0f19
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdtl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%ebp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%ebp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%ebp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%ebp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%ebp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%ecx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%ecx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41                    inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdtl  \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%esi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popl   \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8d 04 40              lea    \(%eax,%eax,2\),%eax
+ +[a-f0-9]+: c9                    leave
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%esi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%esi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%esi\),%es:\(%edi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%edi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%edi\),%ds:\(%esi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%esi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 83 00 01              addl   \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f ba 20 01          btl    \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 03              xadd   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 c3              xadd   %eax,%ebx
+ +[a-f0-9]+: 87 03                xchg   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 93                    xchg   %eax,%ebx
+ +[a-f0-9]+: 39 45 40              cmp    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 3b 45 40              cmp    0x40\(%ebp\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 01 45 40              add    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 03 00                add    \(%eax\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-load-b.e
b/gas/testsuite/gas/i386/lfence-load-b.e
new file mode 100644
index 0000000000..c394e02296
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` skips -mlfence-after-load=general
+.*:??: Warning: `cmps` skips -mlfence-after-load=general
\ No newline at end of file
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..273e302f38 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..613d1d50a2 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..e6dd4f4bf6 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..58f7e0a706
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..9078216e53
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..5de4f08447 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,6 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
new file mode 100644
index 0000000000..b1fd3cad42
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: x86-64-lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: x86-64 lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdt   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%rbp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%rbp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%rbp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%rbp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%rbp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%rcx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%rcx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c7 09          cmpxchg16b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff c1                inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdt   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%rsi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popq   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 8d 04 40          lea    \(%rax,%rax,2\),%rax
+ +[a-f0-9]+: c9                    leaveq
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%rsi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%rsi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%rsi\),%es:\(%rdi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%rdi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%rdi\),%ds:\(%rsi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%rsi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 83 03 01          addl   \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 0f ba 23 01        btl    \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 03          xadd   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 c3          xadd   %rax,%rbx
+ +[a-f0-9]+: 48 87 03              xchg   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 93                xchg   %rax,%rbx
+ +[a-f0-9]+: 48 39 45 40          cmp    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 3b 45 40          cmp    0x40\(%rbp\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 01 45 40          add    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 03 00              add    \(%rax\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..f21aba85d5 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: x86-64 -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: x86-64 -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..43343a9a44 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..6c34affdc0 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..435d342a28
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..6c39b5d747
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+#pass
-- 
2.18.1


--
BR,
Hongtao


More information about the Binutils mailing list