[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f35b6c1-c05f-4475-a4e6-3760eefbe6b0@I-love.SAKURA.ne.jp>
Date: Sat, 28 Sep 2024 17:11:09 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Hillf Danton <hdanton@...a.com>, Michal Koutny <mkoutny@...e.com>
Cc: Chen Ridong <chenridong@...wei.com>, tj@...nel.org,
cgroups@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and
cpu_hotplug_lock
On 2024/09/11 20:15, Hillf Danton wrote:
> On Mon, 9 Sep 2024 16:19:38 +0200 Michal Koutny <mkoutny@...e.com>
>> On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@...wei.com> wrote:
>>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
>>> acquired in different tasks, which may lead to deadlock.
>>> It can lead to a deadlock through the following steps:
>>> 1. A large number of cpusets are deleted asynchronously, which puts a
>>> large number of cgroup_bpf_release works into system_wq. The max_active
>>> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
>>> cgroup_bpf_release works, and many cgroup_bpf_release works will be put
>>> into inactive queue. As illustrated in the diagram, there are 256 (in
>>> the acvtive queue) + n (in the inactive queue) works.
> Given no workqueue work executed without being dequeued, any queued work,
> regardless if they are more than 2048, that acquires cgroup_mutex could not
> prevent the work queued by thread-T from being executed, so thread-T can
> make safe forward progress, therefore with no chance left for the ABBA
> deadlock you spotted where lockdep fails to work.
I made a simple test which queues many work items into system_wq and
measures time needed for flushing last work item.
As number of work items increased, time needed also increased.
Although nobody uses flush_workqueue() on system_wq, several users
use flush_work() on work item in system_wq. Therefore, I think that
queuing thousands of work items in system_wq should be avoided,
regardless of whether there is possibility of deadlock.
----------------------------------------
#include <linux/module.h>
#include <linux/workqueue.h>
static void worker_func(struct work_struct *work)
{
schedule_timeout_uninterruptible(HZ);
}
#define MAX_WORKS 8192
static struct work_struct works[MAX_WORKS];
static int __init test_init(void)
{
int i;
unsigned long start, end;
for (i = 0; i < MAX_WORKS; i++) {
INIT_WORK(&works[i], worker_func);
schedule_work(&works[i]);
}
start = jiffies;
flush_work(&works[MAX_WORKS - 1]);
end = jiffies;
printk("%u: Took %lu jiffies. (HZ=%u)\n", MAX_WORKS, end - start, HZ);
for (i = 0; i < MAX_WORKS; i++)
flush_work(&works[i]);
return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
----------------------------------------
12 CPUs
256: Took 1025 jiffies. (HZ=1000)
512: Took 2091 jiffies. (HZ=1000)
1024: Took 4105 jiffies. (HZ=1000)
2048: Took 8321 jiffies. (HZ=1000)
4096: Took 16382 jiffies. (HZ=1000)
8192: Took 32770 jiffies. (HZ=1000)
1 CPU
256: Took 1133 jiffies. (HZ=1000)
512: Took 2047 jiffies. (HZ=1000)
1024: Took 4117 jiffies. (HZ=1000)
2048: Took 8210 jiffies. (HZ=1000)
4096: Took 16424 jiffies. (HZ=1000)
8192: Took 32774 jiffies. (HZ=1000)
Powered by blists - more mailing lists