linux-kernel - Re: [PATCH 1/3] srcu: use cpu_online() instead custom check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170929010957.GV3521@linux.vnet.ibm.com>
Date:   Thu, 28 Sep 2017 18:09:57 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] srcu: use cpu_online() instead custom check

On Thu, Sep 28, 2017 at 06:02:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> > 
> > But in that case, wouldn't the interrupt block the synchronize_sched()
> > later in the offline sequence?
> 
> What I meant is:
> 
>   CPU0						CPU1
>   preempt_disable();
>   if (READ_ONCE(per_cpu(srcu_online, 1)))
>   *interrupt*
> 						WRITE_ONCE(per_cpu(srcu_online, cpu), false);
> 						and CPU is offnline
>   				
> 	ret = queue_delayed_work_on(1, wq, dwork, delay);
> 
> is this possible or are there a safety belt for this?

I don't see anything that would prevent this.  It is unlikely, but not
so unlikely that it should not be fixed.

> > More to the point, are you actually seeing this failure, or is this
> > a theoretical bug?
> 
> I need to get rid of the preempt_disable() section in which
> queue_delayed_work*() is invoked for RT.

OK, but please see below...

> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > > 
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> > 
> > Another alternative would be to disable preemption across the check and
> > the call to queue_delayed_work_on().
> 
> you need to ensure the *other* CPU won't in the middle of checking its
> status. preempt_disable() won't do this on the other CPU.

Agreed.

> > Yet another alternative would be to have an SRCU-specific per-CPU lock
> > that is acquired across the setting and clearing of srcu_online,
> > and also across the check and the call to queue_delayed_work_on().
> > This last would be more consistent with a desire to remove the
> > synchronize_sched() from the offline sequence.
> > 
> > Or am I missing something here?
> The perCPU lock should work. And cpus_read_lock() is basically that
> except that srcu_online_cpu() is not holding it but the CPU-HP code.
> 
> So you want keep things as-is or do you prefer a per-CPU rwsem instead?

The per-CPU rwsem seems like a reasonable approach.  Except for the
call_srcu() path, given that call_srcu()'s caller might have preemption
(or even interrupts) disabled.

Thoughts?

							Thanx, Paul