linux-kernel - Re: [PATCH v12 3/3] tracing: Centralize preemptirq tracepoints and unify their usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 8 Aug 2018 06:00:41 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Joel Fernandes <joelaf@...gle.com>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Cc: Android Kernel" <kernel-team@...roid.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Byungchul Park <byungchul.park@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Glexiner <tglx@...utronix.de>,
        Tom Zanussi <tom.zanussi@...ux.intel.com>
Subject: Re: [PATCH v12 3/3] tracing: Centralize preemptirq tracepoints and
 unify their usage

On Tue, Aug 07, 2018 at 08:53:54PM -0700, Joel Fernandes wrote:
> On Tue, Aug 7, 2018 at 8:44 PM, Joel Fernandes <joelaf@...gle.com> wrote:
> > Hi Steve,
> >
> > On Tue, Aug 7, 2018 at 7:28 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
> [...]
> >>> @@ -171,8 +174,7 @@ extern void syscall_unregfunc(void);
> >>>                       } while ((++it_func_ptr)->func);                \
> >>>               }                                                       \
> >>>                                                                       \
> >>> -             if (rcuidle)                                            \
> >>> -                     srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> >>> +             srcu_read_unlock_notrace(ss, idx);                      \
> >>
> >> Hmm, why do we have the two different srcu handles?
> >
> > Because if the memory operations happening on the normal SRCU handle
> > (during srcu_read_lock) is interrupted by NMI, then the other handle
> > (devoted to NMI) could be used instead and not bother the interrupted
> > handle. Does that makes sense?
> >
> > When I talked to Paul few months ago about SRCU from NMI context, he
> > mentioned the per-cpu memory operations during srcu_read_lock can be
> > NMI interrupted, that's why we added that warning.
> 
> So I looked more closely, __srcu_read_lock on 2 different handles may
> still be doing a this_cpu_inc on the same location..
> (sp->sda->srcu_lock_count). :-(
> 
> Paul any ideas on how to solve this?

You lost me on this one.  When you said "2 different handles", I assumed
that you meant two different values of "sp", which would have two
different addresses for &sp->sda->srcu_lock_count.  What am I missing?

> It does start to seem like a show stopper :-(

I suppose that an srcu_read_lock_nmi() and srcu_read_unlock_nmi() could
be added, which would do atomic ops on sp->sda->srcu_lock_count.  Not sure
whether this would be fast enough to be useful, but easy to provide:

int __srcu_read_lock_nmi(struct srcu_struct *sp)  /* UNTESTED. */
{
	int idx;

	idx = READ_ONCE(sp->srcu_idx) & 0x1;
	atomic_inc(&sp->sda->srcu_lock_count[idx]);
	smp_mb__after_atomic(); /* B */  /* Avoid leaking critical section. */
	return idx;
}

void __srcu_read_unlock_nmi(struct srcu_struct *sp, int idx)
{
	smp_mb__before_atomic(); /* C */  /* Avoid leaking critical section. */
	atomic_inc(&sp->sda->srcu_unlock_count[idx]);
}

With appropriate adjustments to also allow Tiny RCU to also work.

Note that you have to use _nmi() everywhere, not just in NMI handlers.
In fact, the NMI handlers are the one place you -don't- need to use
_nmi(), strangely enough.

Might be worth a try -- smp_mb__{before,after}_atomic() is a no-op on
some architectures, for example.

							Thanx, Paul