linux-kernel - Re: [RFC PATCH] arch/x86: Optionally flush L1D on context switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7f8c5bc895c9f40c443bcb48e4e0e8cb2b4366fe.camel@amazon.com>
Date:   Mon, 23 Mar 2020 00:37:38 +0000
From:   "Singh, Balbir" <sblbir@...zon.com>
To:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "keescook@...omium.org" <keescook@...omium.org>,
        "Herrenschmidt, Benjamin" <benh@...zon.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH] arch/x86: Optionally flush L1D on context switch

Hi, Thomas,

On Sat, 2020-03-21 at 11:05 +0100, Thomas Gleixner wrote:
> 
> 
> Balbir,
> 
> "Singh, Balbir" <sblbir@...zon.com> writes:
> > On Fri, 2020-03-20 at 12:49 +0100, Thomas Gleixner wrote:
> > > I forgot the gory details by now, but having two entry points or a
> > > conditional and share the rest (page allocation etc.) is definitely
> > > better than two slightly different implementation which basically do the
> > > same thing.
> > 
> > OK, I can try and dedup them to the extent possible, but please do
> > remember
> > that
> > 
> > 1. KVM is usually loaded as a module
> > 2. KVM is optional
> > 
> > We can share code, by putting the common bits in the core kernel.
> 
> Obviously so.
> 
> > > > 1. SWAPGS fixes/work arounds (unless I misunderstood your suggestion)
> > > 
> > > How so? SWAPGS mitigation does not flush L1D. It merily serializes
> > > SWAPGS.
> > 
> > Sorry, my bad, I was thinking MDS_CLEAR (via verw), which does flush out
> > things, which I suspect should be sufficient from a return to user/signal
> > handling, etc perspective.
> 
> MDS is affecting store buffers, fill buffers and load ports. Different
> story.
> 

Yes, what gets me is that as per (
https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-microarchitectural-data-sampling
) it says, "The VERW instruction and L1D_FLUSH command will overwrite the
store buffer value for the current logical processor on processors affected by
MSBDS". In my mind, this makes VERW the same as L1D_FLUSH and hence the
assumption, it could be that L1D_FLUSH is a superset, but it's not clear and I
can't seem to find any other form of documentation on the MSRs and microcode.

> > Right now, reading through
> > 
https://software.intel.com/security-software-guidance/insights/deep-dive-snoop-assisted-l1-data-sampling
> > , it does seem like we need this during a context switch, specifically
> > since a
> > dirty cache line can cause snooped reads for the attacker to leak data. Am
> > I
> > missing anything?
> 
> Yes. The way this goes is:
> 
> CPU0                   CPU1
> 
> victim1
>  store secrit
>                         victim2
> attacker                  read secrit
> 
> Now if L1D is flushed on CPU0 before attacker reaches user space,
> i.e. reaches the attack code, then there is nothing to see. From the
> link:
> 
>   Similar to the L1TF VMM mitigations, snoop-assisted L1D sampling can be
>   mitigated by flushing the L1D cache between when secrets are accessed
>   and when possibly malicious software runs on the same core.
> 
> So the important point is to flush _before_ the attack code runs which
> involves going back to user space or guest mode.

I think there is a more generic case with HT you've highlighted below

> 
> > > Even this is uninteresting:
> > > 
> > >     victim in -> attacker in (stays in kernel, e.g. waits for data) ->
> > >     attacker out -> victim in
> > > 
> > 
> > Not from what I understand from the link above, the attack is a function
> > of
> > what can be snooped by another core/thread and that is a function of what
> > modified secrets are in the cache line/store buffer.
> 
> Forget HT. That's not fixable by any flushing simply because there is no
> scheduling involved.
> 
> CPU0  HT0          CPU0 HT1             CPU1
> 
> victim1            attacker
>  store secrit
>                                         victim2
>                                           read secrit
> 
> > On return to user, we already use VERW (verw), but just return to user
> > protection is not sufficient IMHO. Based on the link above, we need to
> > clear
> > the L1D cache before it can be snooped.
> 
> Again. Flush is required between store and attacker running attack
> code. The attacker _cannot_ run attack code while it is in the kernel so
> flushing L1D on context switch is just voodoo.
> 
> If you want to cure the HT case with core scheduling then the scenario
> looks like this:
> 
> CPU0  HT0          CPU0 HT1             CPU1
> 
> victim1            IDLE
>  store secrit
> -> IDLE
>                    attacker in          victim2
>                                           read secrit
> 
> And yes, there the context switch flush on HT0 prevents it. So this can
> be part of a core scheduling based mitigation or handled via a per core
> flush request.
> 
> But HT is attackable in so many ways ...

I think the reason you prefer exit to user as opposed to switch_mm (switching
task groups/threads) is that it's lower overhead, the reason I prefer switch
mm is 

1. The overhead is not for all tasks, the selection of L1D flush is optional
2. It's more generic and does not make specific assumptions


> 
> Thanks,
> 
>         tglx


Thanks for the review,
Balbir Singh.