linux-kernel - Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f35b6c1-c05f-4475-a4e6-3760eefbe6b0@I-love.SAKURA.ne.jp>
Date: Sat, 28 Sep 2024 17:11:09 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Hillf Danton <hdanton@...a.com>, Michal Koutny <mkoutny@...e.com>
Cc: Chen Ridong <chenridong@...wei.com>, tj@...nel.org,
        cgroups@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and
 cpu_hotplug_lock

On 2024/09/11 20:15, Hillf Danton wrote:
> On Mon, 9 Sep 2024 16:19:38 +0200 Michal Koutny <mkoutny@...e.com>
>> On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@...wei.com> wrote:
>>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
>>> acquired in different tasks, which may lead to deadlock.
>>> It can lead to a deadlock through the following steps:
>>> 1. A large number of cpusets are deleted asynchronously, which puts a
>>>    large number of cgroup_bpf_release works into system_wq. The max_active
>>>    of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
>>>    cgroup_bpf_release works, and many cgroup_bpf_release works will be put
>>>    into inactive queue. As illustrated in the diagram, there are 256 (in
>>>    the acvtive queue) + n (in the inactive queue) works.
> Given no workqueue work executed without being dequeued, any queued work,
> regardless if they are more than 2048, that acquires cgroup_mutex could not
> prevent the work queued by thread-T from being executed, so thread-T can
> make safe forward progress, therefore with no chance left for the ABBA 
> deadlock you spotted where lockdep fails to work.

I made a simple test which queues many work items into system_wq and
measures time needed for flushing last work item.

As number of work items increased, time needed also increased.
Although nobody uses flush_workqueue() on system_wq, several users
use flush_work() on work item in system_wq. Therefore, I think that
queuing thousands of work items in system_wq should be avoided,
regardless of whether there is possibility of deadlock.

----------------------------------------
#include <linux/module.h>
#include <linux/workqueue.h>

static void worker_func(struct work_struct *work)
{
        schedule_timeout_uninterruptible(HZ);
}

#define MAX_WORKS 8192
static struct work_struct works[MAX_WORKS];

static int __init test_init(void)
{
        int i;
        unsigned long start, end;

        for (i = 0; i < MAX_WORKS; i++) {
                INIT_WORK(&works[i], worker_func);
                schedule_work(&works[i]);
        }
        start = jiffies;
        flush_work(&works[MAX_WORKS - 1]);
        end = jiffies;
        printk("%u: Took %lu jiffies. (HZ=%u)\n", MAX_WORKS, end - start, HZ);
        for (i = 0; i < MAX_WORKS; i++)
                flush_work(&works[i]);
        return -EINVAL;
}

module_init(test_init);
MODULE_LICENSE("GPL");
----------------------------------------

12 CPUs
256: Took 1025 jiffies. (HZ=1000)
512: Took 2091 jiffies. (HZ=1000)
1024: Took 4105 jiffies. (HZ=1000)
2048: Took 8321 jiffies. (HZ=1000)
4096: Took 16382 jiffies. (HZ=1000)
8192: Took 32770 jiffies. (HZ=1000)

1 CPU
256: Took 1133 jiffies. (HZ=1000)
512: Took 2047 jiffies. (HZ=1000)
1024: Took 4117 jiffies. (HZ=1000)
2048: Took 8210 jiffies. (HZ=1000)
4096: Took 16424 jiffies. (HZ=1000)
8192: Took 32774 jiffies. (HZ=1000)