linux-kernel - Re: NULL pointer dereference in pick_next_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191106150450.fa5ppdejiggsb46a@e107158-lin.cambridge.arm.com>
Date:   Wed, 6 Nov 2019 15:04:50 +0000
From:   Qais Yousef <qais.yousef@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Quentin Perret <qperret@...gle.com>, linux-kernel@...r.kernel.org,
        aaron.lwe@...il.com, valentin.schneider@....com, mingo@...nel.org,
        pauld@...hat.com, jdesfossez@...italocean.com,
        naravamudan@...italocean.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        kernel-team@...roid.com, john.stultz@...aro.org
Subject: Re: NULL pointer dereference in pick_next_task_fair

On 11/06/19 14:08, Peter Zijlstra wrote:
> On Wed, Nov 06, 2019 at 01:05:25PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 28, 2019 at 05:46:03PM +0000, Quentin Perret wrote:
> > > 
> > > After digging a bit, the offending commit seems to be:
> > > 
> > >     67692435c411 ("sched: Rework pick_next_task() slow-path")
> > > 
> > > By 'offending' I mean that reverting it makes the issue go away. The
> > > issue comes from the fact that pick_next_entity() returns a NULL se in
> > > the 'simple' path of pick_next_task_fair(), which causes obvious
> > > problems in the subsequent call to set_next_entity().
> > > 
> > > I'll dig more, but if anybody understands the issue in the meatime feel
> > > free to send me a patch to try out :)
> > 
> > So for all those who didn't follow along on IRC, the below seems to cure
> > things.
> > 
> > The only thing I'm now considering is if we shouldn't be setting
> > ->on_cpu=2 _before_ calling put_prev_task(). I'll go audit the RT/DL
> > cases.
> 
> So I think it all works, but that's more by accident than anything else.
> I'll move the ->on_cpu=2 assignment earlier. That clearly avoids calling
> put_prev_task() while we're in put_prev_task().

Did you mean avoids calling *set_next_task()* while we're in put_prev_task()?

So what you're saying is that put_prev_task_{rt,dl}() could drop the rq_lock()
too and the race could happen while we're inside these functions, correct? Or
is it a different reason?

By the way, is all reads/writes to ->on_cpu happen when a lock is held? Ie: we
don't need to use any smp read/write barriers?

Cheers

--
Qais Yousef