linux-kernel - Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170510160423.GJ3452@pathway.suse.cz>
Date:   Wed, 10 May 2017 18:04:23 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Jessica Yu <jeyu@...hat.com>, Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>, live-patching@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is
 broken in RCU code

On Tue 2017-05-09 11:18:35, Josh Poimboeuf wrote:
> On Mon, May 08, 2017 at 03:36:00PM -0700, Paul E. McKenney wrote:
> > On Mon, May 08, 2017 at 05:16:09PM -0500, Josh Poimboeuf wrote:
> > > On Mon, May 08, 2017 at 02:07:54PM -0700, Paul E. McKenney wrote:
> > But do we really need this, given the in_nmi() check that Steven
> > pointed out?
> 
> The in_nmi() check doesn't work for non-NMI exceptions.  An exception
> can come from anywhere, which is presumably why ist_enter() calls
> rcu_nmi_enter(), even though it might not have been in NMI context.  The
> exception could, for example, happen while you're twiddling important
> bits in rcu_irq_enter().  Or it could happen early in do_nmi(), before
> it had a chance to set NMI_MASK or call rcu_nmi_enter().  In either
> case, in_nmi() would be false, yet calling rcu_irq_enter() would be bad.
> 
> I think I have convinced myself that, as long as the user doesn't patch
> ist_enter() or rcu_dynticks_eqs_enter(), it'll be fine.  So the
> following should be sufficient:
> 
> 	if (in_nmi())
> 		rcu_nmi_enter(); /* in case we're called before nmi_enter() */

This does not work as expected. in_nmi() is implemented as

	(preempt_count() & NMI_MASK)

These bits are set in nmi_enter(), see

	preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);

Note that nmi_enter() calls rcu_nmi_enter() right after
setting the preempt_count bit.

It means that if in_nmi() returns true, we should already
on the safe side regarding using rcu_read_lock()/unlock().


The patch was designed to use basically the same solution
as is used in the stack tracer. It is using
rcu_read_lock()/unlock() as we do.

The stack tracer is different in the following ways:

    + It takes a spin lock. This is why it has to give
      up in NMI completely.

    + It disables interrupts. I guess that it is because
      of the spin lock as well. Otherwise, it would not
      be safe in IRQ context.

    + It checks whether local_irq_save() has a chance to
      work and gives up if it does not.


On the other hand, the live patch handler:

    + does not need any lock => could be used in NMI

    + does not need to disable interrupts because
      it does not use any lock

    + checks if local_irq_save() actually succeeded.
      It seems more reliable to me.


I am not sure if we all understand the problem. IMHO, the point
is that RCU must be aware when we call rcu_read_lock()/unlock().

My understanding is that rcu_irq_enter() tries to make RCU watching
when it was not. Then rcu_is_watching() reports if we are on
the safe side.

But it is possible that I miss something. One question is if
rcu_irq_enter()/exit() calls can be nested.

I still need to think about it.

Best Regards,
Petr