lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241002112105.LCvHpHN1@linutronix.de>
Date: Wed, 2 Oct 2024 13:21:05 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev
Cc: Ben Segall <bsegall@...gle.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Mel Gorman <mgorman@...e.de>, Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Valentin Schneider <vschneid@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [RFC] Repeated rto_push_irq_work_func() invocation.

I have this in my RT queue:

--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2193,8 +2193,11 @@ static int rto_next_cpu(struct root_doma
 
 		rd->rto_cpu = cpu;
 
-		if (cpu < nr_cpu_ids)
+		if (cpu < nr_cpu_ids) {
+			if (!has_pushable_tasks(cpu_rq(cpu)))
+				continue;
 			return cpu;
+		}
 
 		rd->rto_cpu = -1;
 
This avoided a large number of IPIs to queue and invoke rto_push_work
while a RT task was scheduled. This improved with commit
   612f769edd06a ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask")

Now, looking at this again I still see invocations which are skipped due
this patch on an idle CPU more often than on a busy CPU. Given that the
task is removed from the list and the mask is cleaned almost immediately
this looks like a small window which is probably neglectable.

One thing I am not sure what to do about it (from a busy trace):
|      ksoftirqd/5-63      [005] dN.31  4446.750055: sched_waking: comm=rcu_preempt pid=17 prio=98 target_cpu=005
|      ksoftirqd/5-63      [005] dN.41  4446.750058: enqueue_pushable_task: Add rcu_preempt-17
|      ksoftirqd/5-63      [005] dN.41  4446.750059: enqueue_pushable_task: Set 5

Since the enqueued task is not yet on the CPU it gets added to the
pushable list (the task_current() check could be removed since an
enqueued task can never be on CPU, right?). Give the priorities, the new
task will preempt the current task.

|      ksoftirqd/5-63      [005] dN.41  4446.750060: sched_wakeup: comm=rcu_preempt pid=17 prio=98 target_cpu=005
|      ksoftirqd/5-63      [005] dN.31  4446.750062: sched_stat_runtime: comm=ksoftirqd/5 pid=63 runtime=14625 [ns]
|       cyclictest-5192    [003] d..2.  4446.750062: sched_stat_runtime: comm=cyclictest pid=5192 runtime=13066 [ns]
|       cyclictest-5192    [003] d..2.  4446.750064: dequeue_pushable_task: Del cyclictest-5192
|       cyclictest-5192    [003] d..3.  4446.750065: rto_next_cpu.constprop.0: Look count 1
|       cyclictest-5192    [003] d..3.  4446.750066: rto_next_cpu.constprop.0: Leave CPU 5

This is then observed by other CPUs in the system so rto_next_cpu()
returns CPU 5, resulting in a schedule of rto_push_work to CPU5.

|      ksoftirqd/5-63      [005] dNh1.  4446.750069: push_rt_task: Start
|      ksoftirqd/5-63      [005] dNh1.  4446.750070: push_rt_task: Push rcu_preempt-17 98
|      ksoftirqd/5-63      [005] dNh1.  4446.750071: push_rt_task: resched

push_rt_task() didn't do anything because Need-resched is already set.

|      ksoftirqd/5-63      [005] dNh1.  4446.750071: rto_next_cpu.constprop.0: Look count 1
|      ksoftirqd/5-63      [005] dNh1.  4446.750072: rto_next_cpu.constprop.0: Leave CPU 5

but scheduled rto_push_work again.

|      ksoftirqd/5-63      [005] dNh2.  4446.750074: push_rt_task: Start
|      ksoftirqd/5-63      [005] dNh2.  4446.750074: push_rt_task: Push rcu_preempt-17 98
|      ksoftirqd/5-63      [005] dNh2.  4446.750075: push_rt_task: resched

came to the same conclusion.

|      ksoftirqd/5-63      [005] dNh2.  4446.750075: rto_next_cpu.constprop.0: Look count 1
|      ksoftirqd/5-63      [005] dNh2.  4446.750076: rto_next_cpu.constprop.0: Leave no CPU count: 1

It left with no CPU because it wrapped around. Nothing scheduled.

|      ksoftirqd/5-63      [005] dNh3.  4446.750077: sched_waking: comm=irq_work/5 pid=60 prio=98 target_cpu=005
|       cyclictest-5216    [027] d..2.  4446.750077: dequeue_pushable_task: Del cyclictest-5216
|       cyclictest-5216    [027] d..3.  4446.750079: rto_next_cpu.constprop.0: Look count 1
|      ksoftirqd/5-63      [005] dNh4.  4446.750079: sched_wakeup: comm=irq_work/5 pid=60 prio=98 target_cpu=005
|       cyclictest-5216    [027] d..3.  4446.750080: rto_next_cpu.constprop.0: Leave CPU 5

CPU5 is making progress in terms of scheduling but then CPU27 noticed
the mask and scheduled another rto_push_work.

|      ksoftirqd/5-63      [005] dN.2.  4446.750084: dequeue_pushable_task: Del rcu_preempt-17
|      ksoftirqd/5-63      [005] dN.2.  4446.750085: dequeue_pushable_task: Clear 5
|      ksoftirqd/5-63      [005] d..2.  4446.750086: sched_switch: prev_comm=ksoftirqd/5 prev_pid=63 prev_prio=120 prev_state=R+ ==> next_comm=rcu_preempt next_pid=17 next_prio=98
|      rcu_preempt-17      [005] d.h21  4446.750089: rto_next_cpu.constprop.0: Look count 0
|      rcu_preempt-17      [005] d.h21  4446.750089: rto_next_cpu.constprop.0: Leave no CPU count: 0

This rto_next_cpu() was triggered earlier by CPU27.

At this point I'm not sure if there is something that could be done
about it or if it is a special case.

Would it make sense to avoid scheduling rto_push_work if rq->curr has
NEED_RESCHED set and make the scheduler do push_rt_task()?

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ