linux-kernel - Re: WARNING in task_participate_group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151102153300.GA21006@redhat.com>
Date:	Mon, 2 Nov 2015 16:33:00 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Dmitry Vyukov <dvyukov@...gle.com>
Cc:	Roland McGrath <roland@...k.frob.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Amanieu d'Antras" <amanieu@...il.com>, pmoore@...hat.com,
	Ingo Molnar <mingo@...nel.org>, vdavydov@...allels.com,
	qiaowei.ren@...el.com, dave@...olabs.net,
	Palmer Dabbelt <palmer@...belt.com>,
	LKML <linux-kernel@...r.kernel.org>,
	syzkaller <syzkaller@...glegroups.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: WARNING in task_participate_group_stop

On 11/02, Dmitry Vyukov wrote:
>
> On Mon, Nov 2, 2015 at 4:13 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> > Hi Dmitry,
> >
> > On 11/02, Dmitry Vyukov wrote:
> >>
> >> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334
> >> task_participate_group_stop+0x157/0x1d0()
> >> Modules linked in:
> >> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >>  ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000
> >>  ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000
> >>  ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000
> >> Call Trace:
> >>  [<ffffffff810eca35>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
> >>  [<ffffffff8110f4d7>] task_participate_group_stop+0x157/0x1d0
> >> kernel/signal.c:334
> >>  [<ffffffff81113587>] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060
> >>  [<ffffffff81116ab7>] get_signal+0x387/0x11b0 kernel/signal.c:2316
> >>  [<ffffffff8100cf0d>] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707
> >>  [<ffffffff81005d8d>] prepare_exit_to_usermode+0x11d/0x170
> >> arch/x86/entry/common.c:251
> >>  [<ffffffff81005e83>] syscall_return_slowpath+0xa3/0x2b0
> >> arch/x86/entry/common.c:317
> >>  [<ffffffff82d4f6a7>] int_ret_from_sys_call+0x25/0x8f
> >> arch/x86/entry/entry_64.S:281
> >> ---[ end trace f6697fd630b7c361 ]---
> >>
> >>
> >> The reproducer is (needs to be run as root):
> >>
> >> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> >> #include <sys/ptrace.h>
> >> #include <unistd.h>
> >>
> >> int main()
> >> {
> >>     int pid = 1;
> >>     ptrace(PTRACE_ATTACH, pid, 0, 0);
> >>     ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL);
> >>     sleep(1);
> >>     return 0;
> >> }
> >
> > Thanks.
> >
> > Can't reproduce, but at first glance the problem looks clear...
>
> Humm... did you run as root?

Yes,

> It reproduces all the time on my 4.3 kernel VM. Also firmly killed my
> desktop running 3.13.

Yes, it kills init and crashes the kernel. But I do not see the warning.


> >> Yes, it is weird and it kills init right afterwards.
> >
> > Could you confirm that this WARN_ON() happens _after_ the reproducer exits?
> >
> >> But I wasn't able
> >> to figure out what's the root cause (why task does not have
> >> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered
> >> without root and/or with other than init process. So still posting it
> >> here.
> >
> > Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike SIGCONT)
> > doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc.
> >
> > This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is pending,
> > but still is not right and leads to the warning above: JOBCTL_STOP_PENDING was not
> > set because do_signal_stop()->task_set_jobctl_pending() checks fatal_signal_pending().

On a second thought, in this particular case (your test-case), SIGSTOP/SIGKILL
do not race, although (so far) I think this doesn't matter. JOBCTL_STOP_PENDING
comes from __ptrace_unlink() when the tracee already has the pending SIGKILL
due to PTRACE_O_EXITKILL.

Now. If the tracee (init) wakes up and dequeues SIGKILL before __ptrace_unlink()
adds JOBCTL_STOP_PENDING, it won't see JOBCTL_STOP_PENDING and probably this is
what happens on my testing machine.

Perhaps __ptrace_unlink() should me more carefull too...

> > Probably the patch below should fix the problem, but I'd like to think more before
> > I send the fix.
>
>
> Will test it.

Great, thanks.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/