lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34f216da-da8f-44cc-a9fc-47c8634e84c6@linux.ibm.com>
Date: Tue, 7 May 2024 16:46:36 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Ankur Arora <ankur.a.arora@...cle.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>, peterz@...radead.org,
        paulmck@...nel.org, akpm@...ux-foundation.org, luto@...nel.org,
        bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
        mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, jpoimboe@...nel.org,
        mark.rutland@....com, jgross@...e.com, andrew.cooper3@...rix.com,
        bristot@...nel.org, mathieu.desnoyers@...icios.com,
        geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
        anton.ivanov@...bridgegreys.com, mattst88@...il.com,
        krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...lab.com, richard@....at, mjguzik@...il.com,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling



On 4/27/24 12:30 AM, Ankur Arora wrote:
> 

Hi Ankur,

Sorry for the delay, I was on leave last week.
There might be a delay in response as I am recovering from fever. 

> 
> Great. I'm guessing these tests are when running in voluntary preemption
> mode (under PREEMPT_AUTO).
> 

It was run under preempt=none. 

> If you haven't, could you also try full preemption? There you should see
> identical results unless something is horribly wrong.

I tried preempt=full with patch you provided below. ran the hackbench for much longer 
with 100000 loops. I don't see any regression on the larger system. 
I see slight improvement in some cases.  I dont see any major regression with 10k ops 
which was tried earlier as well. 

==========================================================
1L ops.
==========================================================
Process 10 groups          :       9.85,       9.87(-0.20)
Process 20 groups          :      17.69,      17.32(2.09)
Process 30 groups          :      25.89,      25.96(-0.27)
Process 40 groups          :      34.70,      34.61(0.26)
Process 50 groups          :      44.02,      43.79(0.52)
Process 60 groups          :      52.72,      52.10(1.18)
Thread  10 groups          :      10.50,      10.52(-0.19)
Thread  20 groups          :      18.79,      18.60(1.01)
Process(Pipe) 10 groups    :      10.39,      10.37(0.19)
Process(Pipe) 20 groups    :      18.45,      18.54(-0.49)
Process(Pipe) 30 groups    :      25.63,      25.92(-1.13)
Process(Pipe) 40 groups    :      33.79,      33.49(0.89)
Process(Pipe) 50 groups    :      43.15,      41.83(3.06)
Process(Pipe) 60 groups    :      51.94,      50.32(3.12)
Thread(Pipe)  10 groups    :      10.73,      10.85(-1.12)
Thread(Pipe)  20 groups    :      19.24,      19.35(-0.57)

==========================================================
10k ops.

Process 10 groups          :       1.10,       1.10(0.00)
Process 20 groups          :       1.89,       1.88(0.53)
Process 30 groups          :       2.82,       2.80(0.71)
Process 40 groups          :       3.76,       3.76(0.00)
Process 50 groups          :       4.66,       4.79(-2.79)
Process 60 groups          :       5.74,       5.92(-3.14)
thread  10 groups          :       1.22,       1.20(1.64)
thread  20 groups          :       2.05,       2.05(0.00)
Process(Pipe) 10 groups    :       1.13,       1.13(0.00)
Process(Pipe) 20 groups    :       1.98,       1.93(2.53)
Process(Pipe) 30 groups    :       2.91,       2.75(5.50)
Process(Pipe) 40 groups    :       3.85,       3.65(5.19)
Process(Pipe) 50 groups    :       4.91,       4.91(0.00)
Process(Pipe) 60 groups    :       5.56,       5.90(-6.12)
thread(Pipe)  10 groups    :       1.23,       1.23(0.00)
thread(Pipe)  20 groups    :       1.99,       1.99(0.00)
==========================================================

Other than hackbench, I see slight improvement in unixbench and stress-ng --cpu workloads. 

> 
>> However, I still see 20-50%
>> regression on the larger system(320 CPUS). I will continue to debug why.
> 
> Could you try this patch? This is needed because PREEMPT_AUTO turns on
> CONFIG_PREEMPTION, but not CONFIG_PREEMPT:
> 


> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index eca293794a1e..599410050f6b 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
>                 /* Returning to a kernel context with local irqs enabled. */
>                 WARN_ON_ONCE(!(regs->msr & MSR_EE));
>  again:
> -               if (IS_ENABLED(CONFIG_PREEMPT)) {
> +               if (IS_ENABLED(CONFIG_PREEMPTION)) {
>                         /* Return to preemptible kernel context */
>                         if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) {
>                                 if (preempt_count() == 0)
> 
> 
> --
> ankur

This patch can be considered as the enablement patch for Powerpc for preempt_auto. 
Michael, Nick, Do you see any concerns? 

Ankur, Could you please add this patch, if there are no objections. 

---
>From 878a5a7c990e3459758a5d19d7697b07d8d27d0f Mon Sep 17 00:00:00 2001
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
Date: Tue, 7 May 2024 04:42:04 -0500
Subject: [PATCH] powerpc: add support for preempt_auto

Add PowerPC arch support for PREEMPT_AUTO by defining LAZY bits. 

Since PowerPC doesn't use generic exit to functions, Add 
NR_LAZY check in exit to user and exit to kernel from interrupt 
routines.  

Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
---
 arch/powerpc/Kconfig                   |  1 +
 arch/powerpc/include/asm/thread_info.h | 11 ++++++++++-
 arch/powerpc/kernel/interrupt.c        |  6 ++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..11e7008f5dd3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -268,6 +268,7 @@ config PPC
 	select HAVE_PERF_EVENTS_NMI		if PPC64
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_PREEMPT_AUTO
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 15c5691dd218..227b9273e2e9 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -118,10 +118,19 @@ void arch_setup_new_exec(void);
 #define TIF_POLLING_NRFLAG	19	/* true if poll_idle() is polling TIF_NEED_RESCHED */
 #define TIF_32BIT		20	/* 32 bit binary */
 
+#ifdef CONFIG_PREEMPT_AUTO
+#define TIF_NEED_RESCHED_LAZY	21	/* Lazy rescheduling */
+#endif
+
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
+
+#ifdef CONFIG_PREEMPT_AUTO
+#define _TIF_NEED_RESCHED_LAZY	(1 << TIF_NEED_RESCHED_LAZY)
+#endif
+
 #define _TIF_NOTIFY_SIGNAL	(1<<TIF_NOTIFY_SIGNAL)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_32BIT		(1<<TIF_32BIT)
@@ -144,7 +153,7 @@ void arch_setup_new_exec(void);
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 				 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
 				 _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
-				 _TIF_NOTIFY_SIGNAL)
+				 _TIF_NOTIFY_SIGNAL | _TIF_NEED_RESCHED_LAZY)
 #define _TIF_PERSYSCALL_MASK	(_TIF_RESTOREALL|_TIF_NOERROR)
 
 /* Bits in local_flags */
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index eca293794a1e..0c0b7010995a 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -185,7 +185,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
 	ti_flags = read_thread_flags();
 	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
 		local_irq_enable();
-		if (ti_flags & _TIF_NEED_RESCHED) {
+		if (ti_flags & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
 			schedule();
 		} else {
 			/*
@@ -396,7 +396,9 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
 		/* Returning to a kernel context with local irqs enabled. */
 		WARN_ON_ONCE(!(regs->msr & MSR_EE));
 again:
-		if (IS_ENABLED(CONFIG_PREEMPT)) {
+
+		if ((IS_ENABLED(CONFIG_PREEMPT_AUTO) && IS_ENABLED(CONFIG_PREEMPTION)) ||
+		    (!IS_ENABLED(CONFIG_PREEMPT_AUTO) && IS_ENABLED(CONFIG_PREEMPT))) {
 			/* Return to preemptible kernel context */
 			if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) {
 				if (preempt_count() == 0)
-- 
2.39.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ