[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi8U1PjW_L6Ng9_A80L_1keyEOKud3PVh-8bwPL9W0CNg@mail.gmail.com>
Date: Sun, 28 Apr 2024 13:22:26 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+83e7f982ca045ab4405c@...kaller.appspotmail.com>,
andrii@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
On Sun, 28 Apr 2024 at 13:01, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> The *problem* here is that the page fault doesn't actually happen on a
> user access, it happens on the *ret* instruction in
> rep_movs_alternative itself (which doesn't have a exception fixup,
> obviously, because no exception is supposed to happen there!):
Actually, there's another page fault deeper in that call chain:
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:__put_user_handle_exception+0x0/0x10 arch/x86/lib/putuser.S:125
Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 01 cb 48 89 01 31
c9 0f 01 ca c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 01
ca b9 f2 ff ff ff c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90
RSP: 0000:ffffc90004137d98 EFLAGS: 00050202
RAX: 00000000662d5943 RBX: 0000000000000000 RCX: 0000000000000019
RDX: 0000000000000000 RSI: ffffffff8bcaca20 RDI: ffffffff8c1eaba0
RBP: ffffc90004137e50 R08: ffffffff8fa7cd6f R09: 1ffffffff1f4f9ad
R10: dffffc0000000000 R11: fffffbfff1f4f9ae R12: ffffc90004137de0
R13: dffffc0000000000 R14: 1ffff92000826fb8 R15: 0000000000000019
__do_sys_gettimeofday kernel/time/time.c:147 [inline]
__se_sys_gettimeofday+0xd9/0x240 kernel/time/time.c:140
which is also nonsensical, since that "<0f> 01 ca" code is just the
"CLAC" instruction (which is the first instruction of
__put_user_handle_exception, which is the exception fixup for the
__put_user() functions.
So that seems to be the *first* problem spot, actually. It too is
incomprehensible to me. I must be missing something. A "clac"
instruction cannot take a page fault (except for the instruction fetch
itself, of course).
So if the page fault on the 'RET' instruction was odd, the page fault
on the CLAC is *really* odd.
That original page fault looks like it's just from one of the
put_user() calls in gettimeofday():
if (put_user(ts.tv_sec, &tv->tv_sec) ||
put_user(ts.tv_nsec / 1000, &tv->tv_usec))
and yes, they can fault, but I'm not seeing how that then points to
the CLAC in the exception handler.
Linus
Powered by blists - more mailing lists