linux-kernel - Re: [RFC PATCH v4 00/19] Core scheduling v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YS927Tc3sPHeK9wW7nL=qvqFDoCX23xxtbMZmVhtbO1xw@mail.gmail.com>
Date:   Tue, 17 Mar 2020 20:52:37 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Dario Faggioli <dfaggioli@...e.com>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Luck, Tony" <tony.luck@...el.com>
Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4

On Tue, Mar 17, 2020 at 3:07 PM Tim Chen <tim.c.chen@...ux.intel.com> wrote:
>
> Joel,
>
> >
> > Looks quite interesting. We are trying apply this work to ChromeOS. What we
> > want to do is selectively marking tasks, instead of grouping sets of trusted
> > tasks. I have a patch that adds a prctl which a task can call, and it works
> > well (task calls prctl and gets a cookie which gives it a dedicated core).
> >
> > However, I have the following questions, in particular there are 4 scenarios
> > where I feel the current patches do not resolve MDS/L1TF, would you guys
> > please share your thoughts?
> >
> > 1. HT1 is running either hostile guest or host code.
> >    HT2 is running an interrupt handler (victim).
> >
> >    In this case I see there is a possible MDS issue between HT1 and HT2.
>
> Core scheduling mitigates the userspace to userspace attacks via MDS between the HT.
> It does not prevent the userspace to kernel space attack.  That will
> have to be mitigated via other means, e.g. redirecting interrupts to a core
> that don't run potentially unsafe code.

We have only 2 cores (4 HT) on many devices. It is not an option to
dedicate a core to only running trusted code, that would kill
performance. Another option is to designate a single HT of a
particular core to run both untrusted code and an interrupt handler --
but as Thomas pointed out, this does not work for per-CPU interrupts
or managed interrupts, and the softirqs that they trigger. But if we
just consider interrupts for which we can control the affinities (and
assuming that most interrupts can be controlled like that), then maybe
it will work? In the ChromeOS model, each untrusted task is in its own
domain (cookie). So untrusted tasks cannot benefit from parallelism
(in our case) anyway -- so it seems reasonable to run an affinable
interrupt and an untrusted task, on a particular designated core.

(Just thinking out loud...).  Another option could be a patch that
Vineeth shared with me (that Peter experimentally wrote) where he
sends IPI from an interrupt handler to a sibling running untrusted
guest code which would result in it getting paused. I am hoping
something like this could work on the host side as well (not just for
guests). We could also set per-core state from the interrupted HT,
possibly IPI'ing the untrusted sibling if we have to. If sibling runs
untrusted code *after* the other's siblings interrupt already started,
then the schedule() loop on the untrusted sibling would spin knowing
the other sibling has an interrupt in progress. The softirq is a real
problem though. Perhaps it can also set similar per-core state.
Thoughts?

> > 2. HT1 is executing hostile host code, and gets interrupted by a victim
> >    interrupt. HT2 is idle.
> >    In this case, I see there is a possible MDS issue between interrupt and
> >    the host code on the same HT1.
>
> The cpu buffers are cleared before return to the hostile host code.  So
> MDS shouldn't be an issue if interrupt handler and hostile code
> runs on the same HT thread.

Got it, agreed this is not an issue.

> > 3. HT1 is executing hostile guest code, HT2 is executing a victim interrupt
> >    handler on the host.
> >
> >    In this case, I see there is a possible L1TF issue between HT1 and HT2.
> >    This issue does not happen if HT1 is running host code, since the host
> >    kernel takes care of inverting PTE bits.
>
> The interrupt handler will be run with PTE inverted.  So I don't think
> there's a leak via L1TF in this scenario.

As Thomas and you later pointed out, this is still an issue and will
require a similar solution as described above.

thanks,

- Joel