linux-kernel - Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190816151846.54692edc@oasis.local.home>
Date:   Fri, 16 Aug 2019 15:18:46 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "Joel Fernandes, Google" <joel@...lfernandes.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        paulmck <paulmck@...ux.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Will Deacon <will.deacon@....com>,
        David Howells <dhowells@...hat.com>,
        Alan Stern <stern@...land.harvard.edu>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

On Fri, 16 Aug 2019 13:41:59 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:

> ----- On Aug 16, 2019, at 1:04 PM, rostedt rostedt@...dmis.org wrote:
> 
> > On Fri, 16 Aug 2019 17:48:59 +0100
> > Valentin Schneider <valentin.schneider@....com> wrote:
> >   
> >> On 16/08/2019 17:25, Steven Rostedt wrote:  
> >> >> Also, write and read to/from those variables should be done with
> >> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> >> >> probes without holding the sched_register_mutex.
> >> >>    
> >> > 
> >> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> >> > It's done while holding the mutex. It's not that critical of a path,
> >> > and makes the code look ugly.
> >> >     
> >> 
> >> I seem to recall something like locking primitives don't protect you from
> >> store tearing / invented stores, so if you can have concurrent readers
> >> using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
> >> if it's done in a critical section.  
> > 
> > But for this, it really doesn't matter. The READ_ONCE() is for going
> > from 0->1 or 1->0 any other change is the same as 1.  
> 
> Let's consider this "invented store" scenario (even though as I noted in my
> earlier email, I suspect this particular instance of the code does not appear
> to fit the requirements to generate this in the wild with current compilers):
> 
> intial state:
> 
> sched_tgid_ref = 10;
> 
> Thread A                                Thread B
> 
> registration                            tracepoint probe
> sched_tgid_ref++
>   - compiler decides to invent a
>     store: sched_tgid_ref = 0

This looks to me that this would cause more issues in other parts of
the code if a compiler ever decided to do this.

But I still don't care for this case. It's a cache, nothing more. No
guarantee that anything will be recorded.


>                                         READ_ONCE(sched_tgid_ref) ->
> observes 0, but should really see either 10 or 11.
>   - compiler stores 11 into
>     sched_tgid_ref
> 
> This kind of scenario could cause spurious missing data in the middle
> of a trace caused by another user trying to increment the reference
> count, which is really unexpected.
> 
> A similar scenario can happen for "store tearing" if the compiler
> decides to break the store into many stores close to the 16-bit
> overflow value when updating a 32-bit reference count. Spurious 1 ->
> 0 -> 1 transitions could be observed by readers.
> 
> > When we enable trace events, we start recording the tasks comms such
> > that we can possibly map them to the pids. When we disable trace
> > events, we stop recording the comms so that we don't overwrite the
> > cache when not needed. Note, if more than the max cache of tasks are
> > recorded during a session, we are likely to miss comms anyway.
> > 
> > Thinking about this more, the READ_ONCE() and WRTIE_ONCE() are not
> > even needed, because this is just a best effort anyway.  
> 
> If you choose not to use READ_ONCE(), then the "load tearing" issue
> can cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> overflow as described above. The "Invented load" also becomes an
> issue, because the compiler could use the loaded value for a branch,
> and re-load that value between two branches which are expected to use
> the same value, effectively generating a corrupted state.
> 
> I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> be used in this kind of situation, or if we are fine dealing with the
> awkward compiler side-effects when they will occur.
> 

Let me put it this way. My biggest issue with this, is that the
READ/WRITE_ONCE() has *nothing* to do with the bug you are trying to
fix.

That bug is that we did the decision of starting the probes outside the
mutex. That is fixed my moving the decision into the mutex. The
READ/WRITE_ONCE() is just added noise.

-- Steve