linux-kernel - Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjma_sSghVTgDCQxHHd=e2Lqi45PLh78oJ4WeBj8erV9Q@mail.gmail.com>
Date: Sun, 28 Apr 2024 17:50:30 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+83e7f982ca045ab4405c@...kaller.appspotmail.com>, 
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, andrii@...nel.org, bpf@...r.kernel.org, 
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task

On Sun, 28 Apr 2024 at 16:23, Hillf Danton <hdanton@...a.com> wrote:
>
> So is game like copying from/putting to user with runqueue locked
> at the first place.

No, that should be perfectly fine. In fact, it's even normal. It would
happen any time you have any kind of tracing thing, where looking up
the user mode frame involves doing user accesses with page faults
disabled.

The runqueue lock is irrelevant. As mentioned, it's only a symptom of
something else going wrong.

Now, judging by the syz reproducer, the trigger for this all is almost
certainly that

   bpf$BPF_RAW_TRACEPOINT_OPEN(0x11,
&(0x7f00000000c0)={&(0x7f0000000080)='sched_switch\x00', r0}, 0x10)

and that probably causes the instability. But the immediate problem is
not the user space access, it's that something goes horribly wrong
*around* it.

> Plus as per another syzbot report [1], bpf could make trouble with
> workqueue pool locked.

That seems to be entirely different. There's no unexplained page fault
in that case, that seems to be purely a "take lock in the wrong order"

                Linus