linux-kernel - Re: Linux guest kernel threat model for Confidential Computing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9LDBKXMwoGrACLZ@work-vm>
Date:   Thu, 26 Jan 2023 18:14:28 +0000
From:   "Dr. David Alan Gilbert" <dgilbert@...hat.com>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     "Reshetova, Elena" <elena.reshetova@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Shishkin, Alexander" <alexander.shishkin@...el.com>,
        "Shutemov, Kirill" <kirill.shutemov@...el.com>,
        "Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        "Wunner, Lukas" <lukas.wunner@...el.com>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        "Poimboe, Josh" <jpoimboe@...hat.com>,
        "aarcange@...hat.com" <aarcange@...hat.com>,
        Cfir Cohen <cfir@...gle.com>, Marc Orr <marcorr@...gle.com>,
        "jbachmann@...gle.com" <jbachmann@...gle.com>,
        "pgonda@...gle.com" <pgonda@...gle.com>,
        "keescook@...omium.org" <keescook@...omium.org>,
        James Morris <jmorris@...ei.org>,
        Michael Kelley <mikelley@...rosoft.com>,
        "Lange, Jon" <jlange@...rosoft.com>,
        "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>
Subject: Re: Linux guest kernel threat model for Confidential Computing

* Leon Romanovsky (leon@...nel.org) wrote:
> On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
> > 
> > > * Reshetova, Elena (elena.reshetova@...el.com) wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > > away
> > > > > > > even
> > > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > > these tools, you HAVE TO document that.  Otherwise we think you all are
> > > > > > > crazy and will get your patches rejected.  You all know this, why ignore
> > > > > > > it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > > We will fix this in the future submissions, but some bugs we have are found
> > > by
> > > > > > plain code audit, so 'human' is the tool.
> > > > >
> > > > > My problem with that statement is that by applying different threat
> > > > > model you "invent" bugs which didn't exist in a first place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled correct
> > > > > behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@...ux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > > some
> > > > other cases we already have fixes or investigating) it represents a correct
> > > behavior even if
> > > > you expect that all your pci HW devices are trusted? What about an error in
> > > two
> > > > consequent pci reads? What about just some failure that results in erroneous
> > > input?
> > > 
> > > I'm not sure you'll get general agreement on those answers for all
> > > devices and situations; I think for most devices for non-CoCo
> > > situations, then people are generally OK with a misbehaving PCI device
> > > causing a kernel crash, since most people are running without IOMMU
> > > anyway, a misbehaving device can cause otherwise undetectable chaos.
> > 
> > Ok, if this is a consensus within the kernel community, then we can consider
> > the fixes strictly from the CoCo threat model point of view. 
> > 
> > > 
> > > I'd say:
> > >   a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> > >   guarantee forward progress or stop the hypervisor doing something
> > >   truly stupid.
> > 
> > Yes, denial of service is out of scope but I would not pile all crashes as
> > 'safe' automatically. Depending on the crash, it can be used as a
> > primitive to launch further attacks: privilege escalation, information
> > disclosure and corruption. It is especially true for memory corruption
> > issues. 
> > 
> > >   b) For CoCo, information disclosure, or corruption IS a problem
> > 
> > Agreed, but the path to this can incorporate a number of attack 
> > primitives, as well as bug chaining. So, if the bug is detected, and
> > fix is easy, instead of thinking about possible implications and its 
> > potential usage in exploit writing, safer to fix it.
> > 
> > > 
> > >   c) For non-CoCo some people might care about robustness of the kernel
> > >   against a failing PCI device, but generally I think they worry about
> > >   a fairly clean failure, even in the unexpected-hot unplug case.
> > 
> > Ok.
> 
> With my other hat as a representative of hardware vendor (at least for
> NIC part), who cares about quality of our devices, we don't want to hide
> ANY crash related to our devices, especially if it is related to misbehaving
> PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
> QA/customer support much harder.

Yeh if you're adding new code to be more careful, you want the code to
fail/log the problem, not hide it.
(Although heck, I suspect there are a million apparently working PCI
cards out there that break some spec somewhere).

Dave

> Thanks
> 
-- 
Dr. David Alan Gilbert / dgilbert@...hat.com / Manchester, UK