linux-kernel - Re: [syzbot] possible deadlock in worker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2959649d-cfbc-bdf2-02ac-053b8e7af030@I-love.SAKURA.ne.jp>
Date:   Mon, 14 Feb 2022 10:08:00 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Bart Van Assche <bvanassche@....org>,
        syzbot <syzbot+831661966588c802aae9@...kaller.appspotmail.com>,
        jgg@...pe.ca, linux-kernel@...r.kernel.org,
        linux-rdma@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        Tejun Heo <tj@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/14 8:06, Bart Van Assche wrote:
> On 2/12/22 09:14, Tetsuo Handa wrote:
>> How can reviewing all flush_workqueue(system_long_wq) calls help?
> 
> It is allowed to queue blocking actions on system_long_wq.

Correct.

> flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp)
> wait on a blocking action from a higher layer (e.g. the loop driver)
> and thereby cause a deadlock.

Correct.

> Hence my proposal to review all flush_workqueue(system_long_wq) calls.

Maybe I'm misunderstanding what the "review" means.

My proposal is to "rewrite" any module which needs to call flush_workqueue()
on system-wide workqueues or call flush_work()/flush_*_work() which will
depend on system-wide workqueues.

That is, for example, "rewrite" ib_srp module not to call flush_workqueue(system_long_wq).

+	srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", 0, 0);

-	queue_work(system_long_wq, &target->tl_err_work);
+	queue_work(srp_tl_err_wq, &target->tl_err_work);

-	flush_workqueue(system_long_wq);
+	flush_workqueue(srp_tl_err_wq);

+	destroy_workqueue(srp_tl_err_wq);

Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.