[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4ix=zmQdb5sFKN-9wOZFnitHN0sSwHZJgQeaEM+=6+W1w@mail.gmail.com>
Date: Mon, 8 Feb 2021 15:36:35 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Kees Cook <keescook@...omium.org>
Cc: Jonathan Corbet <corbet@....net>, linux-cxl@...r.kernel.org,
Ben Widawsky <ben.widawsky@...el.com>,
Linux ACPI <linux-acpi@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-nvdimm <linux-nvdimm@...ts.01.org>,
Linux PCI <linux-pci@...r.kernel.org>,
Bjorn Helgaas <helgaas@...nel.org>,
Chris Browy <cbrowy@...ry-design.com>,
Ira Weiny <ira.weiny@...el.com>,
Jon Masters <jcm@...masters.org>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Rafael Wysocki <rafael.j.wysocki@...el.com>,
Randy Dunlap <rdunlap@...radead.org>,
Vishal Verma <vishal.l.verma@...el.com>,
daniel.lll@...baba-inc.com,
"John Groves (jgroves)" <jgroves@...ron.com>,
"Kelley, Sean V" <sean.v.kelley@...el.com>
Subject: Re: [PATCH 08/14] taint: add taint for direct hardware access
On Mon, Feb 8, 2021 at 2:09 PM Kees Cook <keescook@...omium.org> wrote:
>
> On Mon, Feb 08, 2021 at 02:00:33PM -0800, Dan Williams wrote:
> > [ add Jon Corbet as I'd expect him to be Cc'd on anything that
> > generically touches Documentation/ like this, and add Kees as the last
> > person who added a taint (tag you're it) ]
> >
> > Jon, Kees, are either of you willing to ack this concept?
> >
> > Top-posting to add more context for the below:
> >
> > This taint is proposed because it has implications for
> > CONFIG_LOCK_DOWN_KERNEL among other things. These CXL devices
> > implement memory like DDR would, but unlike DDR there are
> > administrative / configuration commands that demand kernel
> > coordination before they can be sent. The posture taken with this
> > taint is "guilty until proven innocent" for commands that have yet to
> > be explicitly allowed by the driver. This is different than NVME for
> > example where an errant vendor-defined command could destroy data on
> > the device, but there is no wider threat to system integrity. The
> > taint allows a pressure release valve for any and all commands to be
> > sent, but flagged with WARN_TAINT_ONCE if the driver has not
> > explicitly enabled it on an allowed list of known-good / kernel
> > coordinated commands.
> >
> > On Fri, Jan 29, 2021 at 4:25 PM Ben Widawsky <ben.widawsky@...el.com> wrote:
> > >
> > > For drivers that moderate access to the underlying hardware it is
> > > sometimes desirable to allow userspace to bypass restrictions. Once
> > > userspace has done this, the driver can no longer guarantee the sanctity
> > > of either the OS or the hardware. When in this state, it is helpful for
> > > kernel developers to be made aware (via this taint flag) of this fact
> > > for subsequent bug reports.
> > >
> > > Example usage:
> > > - Hardware xyzzy accepts 2 commands, waldo and fred.
> > > - The xyzzy driver provides an interface for using waldo, but not fred.
> > > - quux is convinced they really need the fred command.
> > > - xyzzy driver allows quux to frob hardware to initiate fred.
> > > - kernel gets tainted.
> > > - turns out fred command is borked, and scribbles over memory.
> > > - developers laugh while closing quux's subsequent bug report.
>
> But a taint flag only lasts for the current boot. If this is a drive, it
> could still be compromised after reboot. It sounds like this taint is
> really only for ephemeral things? "vendor shenanigans" is a pretty giant
> scope ...
>
That is true. This is more about preventing an ecosystem / cottage
industry of tooling built around bypassing the kernel. So the kernel
complains loudly and hopefully prevents vendor tooling from
propagating and instead directs that development effort back to the
native tooling. However for the rare "I know what I'm doing" cases,
this tainted kernel bypass lets some experimentation and debug happen,
but the kernel is transparent that when the capability ships in
production it needs to be a native implementation.
So it's less, "the system integrity is compromised" and more like
"you're bypassing the development process that ensures sanity for CXL
implementations that may take down a system if implemented
incorrectly". For example, NVME reset is a non-invent, CXL reset can
be like surprise removing DDR DIMM.
Should this be more tightly scoped to CXL? I had hoped to use this in
other places in LIBNVDIMM, but I'm ok to lose some generality for the
specific concerns that make CXL devices different than other PCI
endpoints.
Powered by blists - more mailing lists