lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon,  7 Mar 2022 18:45:57 +0700
From:   Ammar Faizi <ammarfaizi2@...weeb.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Ammar Faizi <ammarfaizi2@...weeb.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Juergen Gross <jgross@...e.com>,
        Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tony Luck <tony.luck@...el.com>,
        Youquan Song <youquan.song@...el.com>,
        linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
        gwml@...r.gnuweeb.org, x86@...nel.org
Subject: [PATCH v1 1/2] x86/include/asm: Avoid using INC and DEC instructions on hot paths

In order to take maximum advantage of out-of-order execution,
avoid using INC and DEC instructions when appropriate. INC/DEC
only writes to part of the flags register, which can cause a
partial flag register stall.

Agner Fog's optimization manual says [1]:
"""
  The INC and DEC instructions are inefficient on some CPUs because they
  write to only part of the flags register (excluding the carry flag).
  Use ADD or SUB instead to avoid false dependences or inefficient
  splitting of the flags register, especially if they are followed by
  an instruction that reads the flags.
"""

Intel's optimization manual 3.5.1.1 says [2]:
"""
  The INC and DEC instructions modify only a subset of the bits in the
  flag register. This creates a dependence on all previous writes of
  the flag register. This is especially problematic when these
  instructions are on the critical path because they are used to change
  an address for a load on which many other instructions depend.

  Assembly/Compiler Coding Rule 33. (M impact, H generality) INC and DEC
  instructions should be replaced with ADD or SUB instructions, because
  ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
  creating false dependencies on earlier instructions that set the flags.
"""

[1]: https://www.agner.org/optimize/optimizing_assembly.pdf
[2]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Signed-off-by: Ammar Faizi <ammarfaizi2@...weeb.org>
---
 arch/x86/include/asm/xor_32.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xor_32.h b/arch/x86/include/asm/xor_32.h
index 67ceb790e639..7aa438f3df20 100644
--- a/arch/x86/include/asm/xor_32.h
+++ b/arch/x86/include/asm/xor_32.h
@@ -53,7 +53,7 @@ xor_pII_mmx_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
 
 	"       addl $128, %1         ;\n"
 	"       addl $128, %2         ;\n"
-	"       decl %0               ;\n"
+	"       subl $1, %0           ;\n"
 	"       jnz 1b                ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2)
@@ -102,7 +102,7 @@ xor_pII_mmx_3(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $128, %1         ;\n"
 	"       addl $128, %2         ;\n"
 	"       addl $128, %3         ;\n"
-	"       decl %0               ;\n"
+	"       subl $1, %0           ;\n"
 	"       jnz 1b                ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3)
@@ -156,7 +156,7 @@ xor_pII_mmx_4(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $128, %2         ;\n"
 	"       addl $128, %3         ;\n"
 	"       addl $128, %4         ;\n"
-	"       decl %0               ;\n"
+	"       subl $1, %0           ;\n"
 	"       jnz 1b                ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4)
@@ -224,7 +224,7 @@ xor_pII_mmx_5(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $128, %3         ;\n"
 	"       addl $128, %4         ;\n"
 	"       addl $128, %5         ;\n"
-	"       decl %0               ;\n"
+	"       subl $1, %0           ;\n"
 	"       jnz 1b                ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3)
@@ -284,7 +284,7 @@ xor_p5_mmx_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
 
 	"       addl $64, %1         ;\n"
 	"       addl $64, %2         ;\n"
-	"       decl %0              ;\n"
+	"       subl $1, %0          ;\n"
 	"       jnz 1b               ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2)
@@ -341,7 +341,7 @@ xor_p5_mmx_3(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $64, %1         ;\n"
 	"       addl $64, %2         ;\n"
 	"       addl $64, %3         ;\n"
-	"       decl %0              ;\n"
+	"       subl $1, %0          ;\n"
 	"       jnz 1b               ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3)
@@ -407,7 +407,7 @@ xor_p5_mmx_4(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $64, %2         ;\n"
 	"       addl $64, %3         ;\n"
 	"       addl $64, %4         ;\n"
-	"       decl %0              ;\n"
+	"       subl $1, %0          ;\n"
 	"       jnz 1b               ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4)
@@ -490,7 +490,7 @@ xor_p5_mmx_5(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 	"       addl $64, %3         ;\n"
 	"       addl $64, %4         ;\n"
 	"       addl $64, %5         ;\n"
-	"       decl %0              ;\n"
+	"       subl $1, %0          ;\n"
 	"       jnz 1b               ;\n"
 	: "+r" (lines),
 	  "+r" (p1), "+r" (p2), "+r" (p3)
-- 
2.32.0

Powered by blists - more mailing lists