linux-kernel - Re: [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200710132129.GA303583@google.com>
Date:   Fri, 10 Jul 2020 09:21:29 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc:     Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>, vineethrp@...il.com,
        Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Tim Chen <tim.c.chen@...el.com>,
        "Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [RFC PATCH 14/16] irq: Add support for core-wide protection of
 IRQ and softirq

On Fri, Jul 10, 2020 at 08:19:24PM +0800, Li, Aubrey wrote:
> Hi Joel/Vineeth,
> 
> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
> > From: "Joel Fernandes (Google)" <joel@...lfernandes.org>
> > 
> > With current core scheduling patchset, non-threaded IRQ and softirq
> > victims can leak data from its hyperthread to a sibling hyperthread
> > running an attacker.
> > 
> > For MDS, it is possible for the IRQ and softirq handlers to leak data to
> > either host or guest attackers. For L1TF, it is possible to leak to
> > guest attackers. There is no possible mitigation involving flushing of
> > buffers to avoid this since the execution of attacker and victims happen
> > concurrently on 2 or more HTs.
> > 
> > The solution in this patch is to monitor the outer-most core-wide
> > irq_enter() and irq_exit() executed by any sibling. In between these
> > two, we mark the core to be in a special core-wide IRQ state.
> > 
> > In the IRQ entry, if we detect that the sibling is running untrusted
> > code, we send a reschedule IPI so that the sibling transitions through
> > the sibling's irq_exit() to do any waiting there, till the IRQ being
> > protected finishes.
> > 
> > We also monitor the per-CPU outer-most irq_exit(). If during the per-cpu
> > outer-most irq_exit(), the core is still in the special core-wide IRQ
> > state, we perform a busy-wait till the core exits this state. This
> > combination of per-cpu and core-wide IRQ states helps to handle any
> > combination of irq_entry()s and irq_exit()s happening on all of the
> > siblings of the core in any order.
> > 
> > Lastly, we also check in the schedule loop if we are about to schedule
> > an untrusted process while the core is in such a state. This is possible
> > if a trusted thread enters the scheduler by way of yielding CPU. This
> > would involve no transitions through the irq_exit() point to do any
> > waiting, so we have to explicitly do the waiting there.
> > 
> > Every attempt is made to prevent a busy-wait unnecessarily, and in
> > testing on real-world ChromeOS usecases, it has not shown a performance
> > drop. In ChromeOS, with this and the rest of the core scheduling
> > patchset, we see around a 300% improvement in key press latencies into
> > Google docs when Camera streaming is running simulatenously (90th
> > percentile latency of ~150ms drops to ~50ms).
> > 
> > This fetaure is controlled by the build time config option
> > CONFIG_SCHED_CORE_IRQ_PAUSE and is enabled by default. There is also a
> > kernel boot parameter 'sched_core_irq_pause' to enable/disable the
> > feature at boot time. Default is enabled at boot time.
> 
> We saw a lot of soft lockups on the screen when we tested v6.
> 
> [  186.527883] watchdog: BUG: soft lockup - CPU#86 stuck for 22s! [uperf:5551]
> [  186.535884] watchdog: BUG: soft lockup - CPU#87 stuck for 22s! [uperf:5444]
> [  186.555883] watchdog: BUG: soft lockup - CPU#89 stuck for 22s! [uperf:5547]
> [  187.547884] rcu: INFO: rcu_sched self-detected stall on CPU
> [  187.553760] rcu: 	40-....: (14997 ticks this GP) idle=49a/1/0x4000000000000002 softirq=1711/1711 fqs=7279 
> [  187.564685] NMI watchdog: Watchdog detected hard LOCKUP on cpu 14
> [  187.564723] NMI watchdog: Watchdog detected hard LOCKUP on cpu 38
> 
> The problem is gone when we reverted this patch. We are running multiple
> uperf threads(equal to cpu number) in a cgroup with coresched enabled.
> This is 100% reproducible on our side.

Interesting. I am guessing you are not doing any hotplug since those fixes
were removed from v6 to expose those hotplug issues..

The last known lockups with this patch were fixed. Appreciate if you can dig
in more and provide logs/traces. The last one I remember was:

HT1                                  HT2
                                     irq_enter()
				     	- sets the core-wide flag
<softirq running>                    
      acquires a lock.
  <gets irq>
  irq_enter() - do nothing.
  irq_exit() - busy wait on flag.
                                     irq_exit()
				       <softirq running>
				       acquire a lock and deadlock.

The fix was to call sched_core_irq_enter() when you enter enter a softirq
from paths other than irq_exit().

Other than this one, we have not seen lockups in heavy testing over the last
2 months since we redesigned this patch to enter the 'private state' on the
outer-most core-wide sched_core_irq_enter().

thanks,

 - Joel