lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190107175644.GB7636@redhat.com>
Date:   Mon, 7 Jan 2019 18:56:45 +0100
From:   Oleg Nesterov <oleg@...hat.com>
To:     Qian Cai <cai@....pw>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        linux kernel <linux-kernel@...r.kernel.org>,
        gkohli@...eaurora.org
Subject: Re: kernel BUG at kernel/sched/core.c:3490!

On 01/07, Qian Cai wrote:
>
>
> On 1/7/19 8:52 AM, Peter Zijlstra wrote:
> > On Tue, Jan 01, 2019 at 12:44:35AM -0500, Qian Cai wrote:
> >> Running some mmap() workloads to put the system on low memory situation with
> >> swapping and OOM, and then it trigger this BUG(),
> >>
> >> void __noreturn do_task_dead(void)
> >> {
> >>         /* Causes final put_task_struct in finish_task_switch(): */
> >>         set_special_state(TASK_DEAD);
> >>
> >>         /* Tell freezer to ignore us: */
> >>         current->flags |= PF_NOFREEZE;
> >>
> >>         __schedule(false);
> >>         BUG();
> >>
> >>         /* Avoid "noreturn function does return" - but don't continue if BUG()
> >> is a NOP: */
> >>         for (;;)
> >>                 cpu_relax();
> >> }
> >
> > This would mean that we somehow loose the TASK_DEAD state before hitting
> > schedule(), but that is something that should be avoided by
> > set_special_state(), which is supposed to serialize against concurrent
> > wake-ups.

or may be pick_next_task() somehow returns the deactivated TASK_DEAD task?

> > How readily does this reproduce?
>
> Running LTP oom01 [1] triggered it at least once in five attempts every time so
> far on v4.20+. Have not tried much on v5.0-rc1 yet.

Can you add

	pr_crit("XXX: %ld %d\n", current->state, current->on_rq);

before that BUG() and reproduce?

Oleg.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ