lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191028214902.GN4643@worktop.programming.kicks-ass.net>
Date:   Mon, 28 Oct 2019 22:49:02 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Quentin Perret <qperret@...gle.com>
Cc:     linux-kernel@...r.kernel.org, aaron.lwe@...il.com,
        valentin.schneider@....com, mingo@...nel.org, pauld@...hat.com,
        jdesfossez@...italocean.com, naravamudan@...italocean.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        juri.lelli@...hat.com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, kernel-team@...roid.com, john.stultz@...aro.org
Subject: Re: NULL pointer dereference in pick_next_task_fair

On Mon, Oct 28, 2019 at 05:46:03PM +0000, Quentin Perret wrote:

> The issue is very transient and relatively hard to reproduce.
> 
> After digging a bit, the offending commit seems to be:
> 
>     67692435c411 ("sched: Rework pick_next_task() slow-path")
> 
> By 'offending' I mean that reverting it makes the issue go away. The
> issue comes from the fact that pick_next_entity() returns a NULL se in
> the 'simple' path of pick_next_task_fair(), which causes obvious
> problems in the subsequent call to set_next_entity().
> 
> I'll dig more, but if anybody understands the issue in the meatime feel
> free to send me a patch to try out :)

The only way for pick_next_entity() to return NULL is if the tree is
empty and !cfs_rq->curr. But in that case, cfs_rq->nr_running _should_
be 0 and or it's related se should not be enqueued in the parent cfs_rq.

Now for the root cfs_rq we check nr_running this and jump to the idle
path, however if this occurs in the middle of the hierarchy, we're up a
creek without no paddles. This is something that really should not
happen (because empty cfs_rq should not be enqueued)

Also, if we take the simple patch, as you say, then we'll have done a
put_prev_task(), regardless of how we got there, so we know cfs_rq->curr
must be NULL. Which, with the above, means the tree really is empty.

And as stated above, when the tree is empty and !cfs_rq->curr, the
cfs_rq's se should not be enqueued in the parent cfs_rq so we should not
be getting here.

Clearly something is buggered with the cgroup state. What is your cgroup
setup, are you using cpu-bandwidth?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ