linux-kernel - RE: [PATCH] cpufreq: Avoid leaving stale IRQ work items during CPU offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 12 Dec 2019 00:56:54 +0000
From:   Anson Huang <anson.huang@....com>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>
CC:     LKML <linux-kernel@...r.kernel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Peng Fan <peng.fan@....com>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Subject: RE: [PATCH] cpufreq: Avoid leaving stale IRQ work items during CPU
 offline


> Subject: [PATCH] cpufreq: Avoid leaving stale IRQ work items during CPU
> offline
> 
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> The scheduler code calling cpufreq_update_util() may run during CPU offline
> on the target CPU after the IRQ work lists have been flushed for it, so the
> target CPU should be prevented from running code that may queue up an
> IRQ work item on it at that point.
> 
> Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set
> for at least one cpufreq policy in the system, because that allows the CPU
> going offline to run the utilization update callback of the cpufreq governor on
> behalf of another (online) CPU in some cases.
> 
> If that happens, the cpufreq governor callback may queue up an IRQ work on
> the CPU running it, which is going offline, and the IRQ work will not be
> flushed after that point.  Moreover, that IRQ work cannot be flushed until
> the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync()
> to wait for the completion of that IRQ work, it will have to wait until the
> "offlining" CPU is back online and that may not happen forever.  In particular,
> a system-wide deadlock may occur during CPU online as a result of that.
> 
> The failing scenario is as follows.  CPU0 is the boot CPU, so it creates a
> cpufreq policy and becomes the "leader" of it (policy->cpu).  It cannot go
> offline, because it is the boot CPU.
> Next, other CPUs join the cpufreq policy as they go online and they leave it
> when they go offline.  The last CPU to go offline, say CPU3, may queue up an
> IRQ work while running the governor callback on behalf of CPU0 after leaving
> the cpufreq policy because of the dvfs_possible_from_any_cpu effect
> described above.  Then, CPU0 is the only online CPU in the system and the
> stale IRQ work is still queued on CPU3.  When, say, CPU1 goes back online, it
> will run
> irq_work_sync() to wait for that IRQ work to complete and so it will wait for
> CPU3 to go back online (which may never happen even in principle), but
> (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide
> deadlock occurs.
> 
> To address this problem notice that CPUs which cannot run cpufreq
> utilization update code for themselves (for example, because they have left
> the cpufreq policies that they belonged to), should also be prevented from
> running that code on behalf of the other CPUs that belong to a cpufreq policy
> with dvfs_possible_from_any_cpu set and so in that case the
> cpufreq_update_util_data pointer of the CPU running the code must not be
> NULL as well as for the CPU which is the target of the cpufreq utilization
> update in progress.
> 
> Accordingly, change cpufreq_this_cpu_can_update() into a regular function
> in kernel/sched/cpufreq.c (instead of a static inline in a header file) and
> make it check the cpufreq_update_util_data pointer of the local CPU if
> dvfs_possible_from_any_cpu is set for the target cpufreq policy.
> 
> Also update the schedutil governor to do the
> cpufreq_this_cpu_can_update() check in the non-fast-switch case too to
> avoid the stale IRQ work issues.
> 
> Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the
> platform permits")
> Link:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.k
> ernel.org%2Flinux-pm%2F20191121093557.bycvdo4xyinbc5cb%40vireshk-
> i7%2F&amp;data=02%7C01%7Canson.huang%40nxp.com%7C969872a0d701
> 4a14b07b08d77e24e5b7%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %7C637116569251193552&amp;sdata=hpMVZLv2%2Bx%2F4s1Vd239uwVJXi
> aWTcOMgkVILvj5nih4%3D&amp;reserved=0
> Reported-by: Anson Huang <anson.huang@....com>
> Cc: 4.14+ <stable@...r.kernel.org> # 4.14+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

Passed over 1 day CPU hotplug stress test on single/dual clusters platforms.
 
Tested-by: Anson Huang <anson.huang@....com>