linux-kernel - [PATCH] sched, fair: try to prevent migration thread from preempting non-cfs task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210615121551.31138-1-laoar.shao@gmail.com>
Date:   Tue, 15 Jun 2021 20:15:51 +0800
From:   Yafang Shao <laoar.shao@...il.com>
To:     vincent.guittot@...aro.org, mingo@...hat.com, peterz@...radead.org,
        juri.lelli@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com
Cc:     linux-kernel@...r.kernel.org, Yafang Shao <laoar.shao@...il.com>,
        Valentin Schneider <valentin.schneider@....com>
Subject: [PATCH] sched, fair: try to prevent migration thread from preempting non-cfs task

We monitored our latency-sensitive RT tasks are randomly preempted by the
kthread migration/n, which means to migrate task from CPUn to the new
idle CPU which wakes up the migration/n. For example,

    sensing_node-2511    [007] d...   945.351566: sched_switch: prev_comm=sensing_node prev_pid=2511 prev_prio=98 prev_state=S ==> next_comm=cat next_pid=2686 next_prio=120
             cat-2686    [007] d...   945.351569: sched_switch: prev_comm=cat prev_pid=2686 prev_prio=120 prev_state=R+ ==> next_comm=sensing_node next_pid=2512 next_prio=98
    sensing_node-2516    [004] dn..   945.351571: sched_wakeup: comm=migration/7 pid=47 prio=0 target_cpu=007
    sensing_node-2512    [007] d...   945.351572: sched_switch: prev_comm=sensing_node prev_pid=2512 prev_prio=98 prev_state=R ==> next_comm=migration/7 next_pid=47 next_prio=0
    sensing_node-2516    [004] d...   945.351572: sched_switch: prev_comm=sensing_node prev_pid=2516 prev_prio=98 prev_state=S ==> next_comm=sensing_node next_pid=2502 next_prio=98
     migration/7-47      [007] d...   945.351580: sched_switch: prev_comm=migration/7 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=sensing_node next_pid=2512 next_prio=98
    sensing_node-2502    [004] d...   945.351605: sched_switch: prev_comm=sensing_node prev_pid=2502 prev_prio=98 prev_state=S ==> next_comm=cat next_pid=2686 next_prio=120

When CPU4 is waking migration/7, the CFS thread 'cat' is running on
CPU7, but then 'cat' is preempted by a RT task 'sensing_node', and
then the migration/7 preempts the RT task. The race happens between:
    if (need_active_balance(&env)) {
and
        raw_spin_rq_lock_irqsave(busiest, flags);

In order to reduce the race, we'd better do the last minute check before
waking up migration thread.

Signed-off-by: Yafang Shao <laoar.shao@...il.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Valentin Schneider <valentin.schneider@....com>

---

- Prev version
  https://lore.kernel.org/lkml/CAKfTPtBd349eyDhA5ThCAHFd83cGMQKb_LDxD4QvyP-cJOBjqA@mail.gmail.com/

- Similar discussion
  https://lore.kernel.org/lkml/CAKfTPtBygNcVewbb0GQOP5xxO96am3YeTZNP5dK9BxKHJJAL-g@mail.gmail.com/
---
 kernel/sched/fair.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3248e24a90b0..597c7a940746 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9797,6 +9797,20 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 			/* Record that we found at least one task that could run on this_cpu */
 			env.flags &= ~LBF_ALL_PINNED;
 
+			/*
+			 * There may be a race between load balance starting migration
+			 * thread to pull the cfs running thread and the RT thread
+			 * waking up and preempting cfs task before migration threads
+			 * which then preempt the RT thread.
+			 * We'd better do the last minute check before starting
+			 * migration thread to avoid preempting latency-sensitive thread.
+			 */
+			if (busiest->curr->sched_class != &fair_sched_class) {
+				raw_spin_unlock_irqrestore(&busiest->lock,
+							   flags);
+				goto out;
+			}
+
 			/*
 			 * ->active_balance synchronizes accesses to
 			 * ->active_balance_work.  Once set, it's cleared
-- 
2.17.1