lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210903121950.2284-1-thunder.leizhen@huawei.com>
Date:   Fri, 3 Sep 2021 20:19:50 +0800
From:   Zhen Lei <thunder.leizhen@...wei.com>
To:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>
CC:     Zhen Lei <thunder.leizhen@...wei.com>,
        Mark Rutland <mark.rutland@....com>
Subject: [PATCH] arm64: entry: Improve the performance of system calls

Commit 582f95835a8f ("arm64: entry: convert el0_sync to C") converted lots
of functions from assembly to C, this greatly improves readability. But
el0_svc()/el0_svc_compat() is in response to system call requests from
user mode and may be in the hot path.

Although the SVC is in the first case of the switch statement in C, the
compiler optimizes the switch statement as a whole, and does not give SVC
a small boost.

Use "likely()" to help SVC directly invoke its handler after a simple
judgment to avoid entering the switch table lookup process.

After:
0000000000000ff0 <el0t_64_sync_handler>:
     ff0:       d503245f        bti     c
     ff4:       d503233f        paciasp
     ff8:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
     ffc:       910003fd        mov     x29, sp
    1000:       d5385201        mrs     x1, esr_el1
    1004:       531a7c22        lsr     w2, w1, #26
    1008:       f100545f        cmp     x2, #0x15
    100c:       540000a1        b.ne    1020 <el0t_64_sync_handler+0x30>
    1010:       97fffe14        bl      860 <el0_svc>
    1014:       a8c17bfd        ldp     x29, x30, [sp], #16
    1018:       d50323bf        autiasp
    101c:       d65f03c0        ret
    1020:       f100705f        cmp     x2, #0x1c

Execute "./lat_syscall null" on my board (BogoMIPS : 200.00), it can save
about 10ns.

Before:
Simple syscall: 0.2365 microseconds
Simple syscall: 0.2354 microseconds
Simple syscall: 0.2339 microseconds

After:
Simple syscall: 0.2255 microseconds
Simple syscall: 0.2254 microseconds
Simple syscall: 0.2256 microseconds

Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
---
 arch/arm64/kernel/entry-common.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe77b..062eb5a895ec6f3 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -607,11 +607,14 @@ static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr)
 asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	unsigned long ec = ESR_ELx_EC(esr);
 
-	switch (ESR_ELx_EC(esr)) {
-	case ESR_ELx_EC_SVC64:
+	if (likely(ec == ESR_ELx_EC_SVC64)) {
 		el0_svc(regs);
-		break;
+		return;
+	}
+
+	switch (ec) {
 	case ESR_ELx_EC_DABT_LOW:
 		el0_da(regs, esr);
 		break;
@@ -730,11 +733,14 @@ static void noinstr el0_svc_compat(struct pt_regs *regs)
 asmlinkage void noinstr el0t_32_sync_handler(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	unsigned long ec = ESR_ELx_EC(esr);
 
-	switch (ESR_ELx_EC(esr)) {
-	case ESR_ELx_EC_SVC32:
+	if (likely(ec == ESR_ELx_EC_SVC32)) {
 		el0_svc_compat(regs);
-		break;
+		return;
+	}
+
+	switch (ec) {
 	case ESR_ELx_EC_DABT_LOW:
 		el0_da(regs, esr);
 		break;
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ