linux-kernel - Re: [tip:perf/core] perf: Fix race in perf_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKb7LJ2HgQEJeXZd05vc3QXqgAp9bVb0j3r2N2QzJAZcA@mail.gmail.com>
Date:	Thu, 7 Jan 2016 11:36:21 -0500
From:	Eric Dumazet <edumazet@...gle.com>
To:	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	syzkaller <syzkaller@...glegroups.com>,
	LKML <linux-kernel@...r.kernel.org>, acme@...hat.com,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alexander Potapenko <glider@...gle.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"H. Peter Anvin" <hpa@...or.com>, jolsa@...hat.com,
	Ingo Molnar <mingo@...nel.org>, vincent.weaver@...ne.edu,
	Thomas Gleixner <tglx@...utronix.de>, acme@...nel.org,
	Stephane Eranian <eranian@...gle.com>,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perf/core] perf: Fix race in perf_event_exec()

On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
>> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
>> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
>> > <tipbot@...or.com> wrote:
>> >
>> > >
>> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
>> > > pointer around. Therefore you have to either disable preemption when looking
>> > > at current, or hold ctx->lock.
>> > >
>> >
>> >
>> > >
>> > >  void perf_event_exec(void)
>> > >  {
>> > > -       struct perf_event_context *ctx;
>> > >         int ctxn;
>> > >
>> > >         rcu_read_lock();
>> >
>> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
>> > uses  local_irq_save( ?
>>
>> Strictly speaking we should not rely on the fact that RCU grace periods
>> do not progress with IRQs disabled, so yes.
>
> What Peter said!
>
> The current implementation would let you get away with IRQs disabled
> (aside from lockdep splats), but it is easy to imagine implementations
> that do not, hence the restriction.
>
> Is the extra rcu_read_lock() and rcu_read_unlock()  causing a performance
> problem?  If so, one option would be use of synchronize_sched() and
> call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
> critical sections.  Of course, this might well put you in a position
> where you would like to use IRQ disabling in some cases, BH disabling
> (or softirq context) in others, and rcu_read_lock() in yet other cases.
>

I guess I could not see why the rcu_read_lock() was needed around this loop.

rcu_read_lock();
for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
       perf_event_enable_on_exec(ctxn);
rcu_read_unlock();

But apparently it is needed in perf_event_enable_on_exec() and we avoid calling
rcu_read_lock() and rcu_read_unlock()  (perf_nr_task_contexts - 1 ) times

> If so, the good news is that RCU actually now supports this efficiently.
> There is the shiny new synchronize_rcu_mult() macro that can be used
> as follows to wait for all three types of grace periods:
>
>         synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
>
> This would wait concurrently for all three grace periods to elapse.
>

Interesting ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/