linux-kernel - Re: [PATCH 2/3] cxl/mbox: Add GET_POISON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220620115307.000037a2@Huawei.com>
Date:   Mon, 20 Jun 2022 11:53:07 +0100
From:   Jonathan Cameron <Jonathan.Cameron@...wei.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     Alison Schofield <alison.schofield@...el.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        "Weiny, Ira" <ira.weiny@...el.com>,
        "Verma, Vishal L" <vishal.l.verma@...el.com>,
        Ben Widawsky <bwidawsk@...nel.org>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        "linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "a.manzanares@...sung.com" <a.manzanares@...sung.com>
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command
 support

On Fri, 17 Jun 2022 12:02:32 -0700
Dan Williams <dan.j.williams@...el.com> wrote:

> Alison Schofield wrote:
> > On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote:  
> > > On Tue, 14 Jun 2022, alison.schofield@...el.com wrote:
> > >   
> > > >From: Alison Schofield <alison.schofield@...el.com>
> > > >
> > > >CXL devices that support persistent memory maintain a list of locations
> > > >that are poisoned or result in poison if the addresses are accessed by
> > > >the host.
> > > >
> > > >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > > >list as a set of  Media Error Records that include the source of the
> > > >error, the starting device physical address and length. The length is
> > > >the number of adjacent DPAs in the record and is in units of 64 bytes.
> > > >
> > > >Retrieve the list and log each Media Error Record as a trace event of
> > > >type cxl_poison_list.
> > > >
> > > >Signed-off-by: Alison Schofield <alison.schofield@...el.com>
> > > >---
> > > > drivers/cxl/cxlmem.h    | 43 +++++++++++++++++++++++
> > > > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 118 insertions(+)
> > > >  
> > snip
> >   
> > > >+int cxl_mem_get_poison_list(struct device *dev)
> > > >+{
> > > >+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > >+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > >+	struct cxl_mbox_poison_payload_out *po;
> > > >+	struct cxl_mbox_poison_payload_in pi;
> > > >+	int nr_records = 0;
> > > >+	int rc, i;
> > > >+
> > > >+	if (range_len(&cxlds->pmem_range)) {
> > > >+		pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > > >+		pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));  
> > 
> > First off - you stopped at a bug here - that pi.length needs to be
> > in units of 64 bytes.  
> > > 
> > > Do you ever see this changing to not always use the full pmem DPA range
> > > but allow arbitrary ones? I also assume this is the reason why you don't
> > > check the range vs cxlds->ram_range to prevent any overlaps, no?
> > > 
> > > Thanks,
> > > Davidlohr  
> > 
> > David - Great question!
> > 
> > I'm headed in this direction -
> > 
> > cxl list --media-errors -m mem1
> > 	lists media errors for requested memdev
> > 
> > cxl list --media-errors -r region#
> > 	lists region errors with HPA addresses
> > 	(So here cxl tool will collect the poison for all the regions
> > 	 memdevs and do the DPA to HPA translation)
> > 
> > To answer your question, I wasn't thinking of limiting
> > the range within the memdev, but certainly could. And if we were
> > taking in ranges, those ranges would need to be checked.
> > 
> > $cxl list --media-errors -m mem1 --range-start=  --range-end|len=
> > 
> > Now, if I left the sysfs inteface as is, the driver will read the 
> > entire poison list for the memdev and then cxl tool will filter it
> > for the range requested. 
> > 
> > Or, maybe we should implement in libcxl (not sysfs), with memdev and
> > range options and only collect from the device the range requested.
> > 
> > Either one looks the same to the cxl tool user, but limiting the
> > range we send to the device would certainly cut down on unwanted
> > records being logged, retrieved, and examined.
> > 
> > I'd like to hear more from you and other community members.  
> 
> There is some history here that is relevant to this design. CXL Get
> Poison List builds on lessons learned from the ACPI "Address Range
> Scrub" mechanism that was deployed for ACPI described persistent memory
> platform (See ACPI 6.4 9.20.7.2 "Address Range Scrubbing (ARS)
> Overview"). In that case there was no expectation that the device
> maintained a cached and coherent (with incoming poison writes) copy of
> the media error list. CXL Get Poison List in comparison is meant to
> obviate the need to perform Scan Media in most scenarios, and it is
> lightweight enough that userspace need not have a mechanism to request
> errors by range, in my opinion.
> 
> One of the design warts of drivers/acpi/nfit/ that I want to get away
> from in the case of drivers/cxl/ is snooping the equivalent of ARS
> command results to populate a kernel list of poison addresses and
> instead put the onus on userspace to collect DPA events and optionally
> inform the kernel of the HPA impacts.

Can you give more info on why such an in kernel flow doesn't work
well for this usecase (particularly for volatile memory)? I don't
much like the flow differing from how we do DRAM patrol scrub based
handling today. I'm not sure I like the requirement for userspace
to be in the loop if we are dealing with volatile failures even if
the detection is async.

> For example, DAX filesystems will
> soon have the ability to do something useful with poison notifications
> [1], but that support is limited to synchronously consumed poison
> flagged by memory_failure(). When the cxl tool translates the poison
> list to HPA it needs an ABI to turn around and notify the filesystem
> about which blocks got clobbered.

I'm not clear why it makes sense to do the DPA to HPA conversion in
userspace. It should be a fairly trivial bit of code to do the reverse look
up in kernel and then we just bolt into existing infrastructure.

Jonathan

> 
> [1]: https://lore.kernel.org/all/20220616193157.2c2e963f3e7e38dfac554a28@linux-foundation.org/