lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Jun 2024 00:07:23 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
        paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
        juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        LKML <linux-kernel@...r.kernel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling



On 6/19/24 8:10 AM, Ankur Arora wrote:


>>>
>>> SOFTIRQ per second:
>>> ===================
>>> 6.10:
>>> ===================
>>> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU
>>> 0.00	3966.47	0.00	18.25	0.59	0.00		0.34		12811.00	0.00		9693.95
>>>
>>> Preempt_auto:
>>> ===================
>>> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU
>>> 0.00	4871.67	0.00	18.94	0.40	0.00		0.25		13518.66	0.00		15732.77
>>>
>>> Note: RCU softirq seems to increase significantly. Not sure which one triggers. still trying to figure out why.
>>> It maybe irq triggering to softirq or softirq causing more IPI.
>>
>> Did an experiment keeping the number of CPU constant, while changing the number of sockets they span across.CPU 
>> When all CPU belong to same socket, there is no regression w.r.t to PREEMPT_AUTO. Regression starts when the CPUs start
>> spanning across sockets.
> 
> Ah. That's really interesting. So, upto 160 CPUs was okay?

No. In both the cases CPUs are limited to 96. In one case its in single NUMA node and in other case its across two NUMA nodes. 

> 
>> Since Preempt auto by default enables preempt count, I think that may cause the regression. I see Powerpc uses generic implementation
>> which may not scale well.
> 
> Yeah this would explain why I don't see similar behaviour on a 384 CPU
> x86 box.
> 
> Also, IIRC the powerpc numbers on preempt=full were significantly worse
> than preempt=none. That test might also be worth doing once you have the
> percpu based method working.
> 
>> Will try to shift to percpu based method and see. will get back if I can get that done successfully.
> 
> Sounds good to me.
> 

Did give a try. Made the preempt count per CPU by adding it in paca field. Unfortunately it didn't
improve the the performance. Its more or less same as preempt_auto.  

Issue still remains illusive. Likely crux is that somehow IPI-interrupts and SOFTIRQs are increasing 
with preempt_auto. Doing some more data collection with perf/ftrace. Will share that soon. 

This was the patch which I tried to make it per cpu for powerpc: It boots and runs workload.
Implemented a simpler one instead of folding need resched into preempt count. By hacky way avoided 
tif_need_resched calls as didnt affect the throughput. Hence kept it simple. Below is the patch 
for reference. It didn't help fix the regression unless I implemented it wrongly.  


diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 1d58da946739..374642288061 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -268,6 +268,7 @@ struct paca_struct {
 	u16 slb_save_cache_ptr;
 #endif
 #endif /* CONFIG_PPC_BOOK3S_64 */
+	int preempt_count;
 #ifdef CONFIG_STACKPROTECTOR
 	unsigned long canary;
 #endif
diff --git a/arch/powerpc/include/asm/preempt.h b/arch/powerpc/include/asm/preempt.h
new file mode 100644
index 000000000000..406dad1a0cf6
--- /dev/null
+++ b/arch/powerpc/include/asm/preempt.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <linux/thread_info.h>
+
+#ifdef CONFIG_PPC64
+#include <asm/paca.h>
+#endif
+#include <asm/percpu.h>
+#include <asm/smp.h>
+
+#define PREEMPT_ENABLED (0)
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return READ_ONCE(local_paca->preempt_count);
+}
+
+static __always_inline void preempt_count_set(int pc)
+{
+	WRITE_ONCE(local_paca->preempt_count, pc);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define init_task_preempt_count(p) do { } while (0)
+
+#define init_idle_preempt_count(p, cpu) do { } while (0)
+
+static __always_inline void set_preempt_need_resched(void)
+{
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return false;
+}
+
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+	preempt_count_set(preempt_count() + val);
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+	preempt_count_set(preempt_count() - val);
+}
+
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+	/*
+	 * Because of load-store architectures cannot do per-cpu atomic
+	 * operations; we cannot use PREEMPT_NEED_RESCHED because it might get
+	 * lost.
+	 */
+	preempt_count_set(preempt_count() - 1);
+	if (preempt_count() == 0 && tif_need_resched())
+		return true;
+	else
+		return false;
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(int preempt_offset)
+{
+	return unlikely(preempt_count() == preempt_offset && tif_need_resched());
+}
+
+//EXPORT_SYMBOL(per_cpu_preempt_count);
+
+#ifdef CONFIG_PREEMPTION
+extern asmlinkage void preempt_schedule(void);
+extern asmlinkage void preempt_schedule_notrace(void);
+
+#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+
+void dynamic_preempt_schedule(void);
+void dynamic_preempt_schedule_notrace(void);
+#define __preempt_schedule()		dynamic_preempt_schedule()
+#define __preempt_schedule_notrace()	dynamic_preempt_schedule_notrace()
+
+#else /* !CONFIG_PREEMPT_DYNAMIC || !CONFIG_HAVE_PREEMPT_DYNAMIC_KEY*/
+
+#define __preempt_schedule() preempt_schedule()
+#define __preempt_schedule_notrace() preempt_schedule_notrace()
+
+#endif /* CONFIG_PREEMPT_DYNAMIC && CONFIG_HAVE_PREEMPT_DYNAMIC_KEY*/
+#endif /* CONFIG_PREEMPTION */
+
+#endif /* __ASM_PREEMPT_H */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 0d170e2be2b6..bf2199384751 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -52,8 +52,8 @@
  * low level task data.
  */
 struct thread_info {
-	int		preempt_count;		/* 0 => preemptable,
-						   <0 => BUG */
+	//int		preempt_count;		// 0 => preemptable,
+						//   <0 => BUG
 #ifdef CONFIG_SMP
 	unsigned int	cpu;
 #endif
@@ -77,7 +77,6 @@ struct thread_info {
  */
 #define INIT_THREAD_INFO(tsk)			\
 {						\
-	.preempt_count = INIT_PREEMPT_COUNT,	\
 	.flags =	0,			\
 }
 
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 7502066c3c53..f90245b8359f 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -204,6 +204,7 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 #ifdef CONFIG_PPC_64S_HASH_MMU
 	new_paca->slb_shadow_ptr = NULL;
 #endif
+	new_paca->preempt_count = PREEMPT_DISABLED;
 
 #ifdef CONFIG_PPC_BOOK3E_64
 	/* For now -- if we have threads this will be adjusted later */
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 85050be08a23..2adab682aab9 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -33,6 +33,8 @@
 #include <asm/ultravisor.h>
 #include <asm/crashdump-ppc64.h>
 
+#include <linux/percpu-defs.h>
+
 int machine_kexec_prepare(struct kimage *image)
 {
 	int i;
@@ -324,7 +326,7 @@ void default_machine_kexec(struct kimage *image)
 	 * XXX: the task struct will likely be invalid once we do the copy!
 	 */
 	current_thread_info()->flags = 0;
-	current_thread_info()->preempt_count = HARDIRQ_OFFSET;
+	local_paca->preempt_count = HARDIRQ_OFFSET;
 
 	/* We need a static PACA, too; copy this CPU's PACA over and switch to
 	 * it. Also poison per_cpu_offset and NULL lppaca to catch anyone using

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ