lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191106172631.euq7ggvfao2kvyld@e107158-lin.cambridge.arm.com>
Date:   Wed, 6 Nov 2019 17:26:32 +0000
From:   Qais Yousef <qais.yousef@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Quentin Perret <qperret@...gle.com>, linux-kernel@...r.kernel.org,
        aaron.lwe@...il.com, valentin.schneider@....com, mingo@...nel.org,
        pauld@...hat.com, jdesfossez@...italocean.com,
        naravamudan@...italocean.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        kernel-team@...roid.com, john.stultz@...aro.org
Subject: Re: NULL pointer dereference in pick_next_task_fair

On 11/06/19 17:57, Peter Zijlstra wrote:
> On Wed, Nov 06, 2019 at 03:04:50PM +0000, Qais Yousef wrote:
> > On 11/06/19 14:08, Peter Zijlstra wrote:
> > > On Wed, Nov 06, 2019 at 01:05:25PM +0100, Peter Zijlstra wrote:
> 
> > > > The only thing I'm now considering is if we shouldn't be setting
> > > > ->on_cpu=2 _before_ calling put_prev_task(). I'll go audit the RT/DL
> > > > cases.
> > > 
> > > So I think it all works, but that's more by accident than anything else.
> > > I'll move the ->on_cpu=2 assignment earlier. That clearly avoids calling
> > > put_prev_task() while we're in put_prev_task().
> > 
> > Did you mean avoids calling *set_next_task()* while we're in put_prev_task()?
> 
> Either, really. The change pattern does put_prev_task() first, and then
> restores state by calling set_next_task(). And it can do that while
> we're in put_prev_task(), unless we're setting ->on_cpu=2.

*head starts spinning*

I can't see how we can have double put_prev_task() in a row. Let me stare more
at the code.

> 
> > So what you're saying is that put_prev_task_{rt,dl}() could drop the rq_lock()
> > too and the race could happen while we're inside these functions, correct? Or
> > is it a different reason?
> 
> Indeed, except it looks like that actually works (mostly by accident).

+1

I think I got it now, it's the double_lock_balance() that can drop the lock.
It even has a comment above it!

> 
> > By the way, is all reads/writes to ->on_cpu happen when a lock is held? Ie: we
> > don't need to use any smp read/write barriers?
> 
> Yes, ->on_cpu is fully serialized by rq->lock. We use
> smp_store_release() in finish_task() due to ttwu spin-waiting on it
> (which reminds me, riel was seeing lots of that).

Thanks. I had to ask as it was hard to walk all the paths.

Sometimes I get tempted to sprinkle comments or lockdep_assert() but then
I think that can easily get ugly and out of hand. I guess one just has to know
the code.

Cheers

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ