lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220615171930.GA1523982@alison-desk>
Date:   Wed, 15 Jun 2022 10:19:30 -0700
From:   Alison Schofield <alison.schofield@...el.com>
To:     "Weiny, Ira" <ira.weiny@...el.com>
Cc:     "Williams, Dan J" <dan.j.williams@...el.com>,
        "Verma, Vishal L" <vishal.l.verma@...el.com>,
        Ben Widawsky <bwidawsk@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        "linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Wed, Jun 15, 2022 at 08:01:50AM -0700, Ira Weiny wrote:
> On Tue, Jun 14, 2022 at 10:07:52PM -0700, Alison Schofield wrote:
> > On Tue, Jun 14, 2022 at 08:22:41PM -0700, Ira Weiny wrote:
> > 
> 
> [snip]
> 
> > > > +
> > > > +	do {
> > > > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > > > +				       sizeof(pi), po, cxlds->payload_size);
> > > > +		if (rc)
> > > > +			goto out;
> > > > +
> > > > +		if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > > > +			time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > > > +
> > > > +			dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > > > +				&o_time);
> > > > +			rc = -ENXIO;
> > > > +			goto out;
> > > 
> > > I guess the idea is that this return will trigger something else will clear the list,
> > > rebuild the list, and perform a scan media request?
> > > 
> > Per CXL Spec 8.2.9.5.4.1: The poison list may be incomplete when the list
> > has overflowed. User can perform a Scan Media to try to clear and rebuild
> > the list, with no guarantee that the overflow will not recur.
> > 
> > So yes to what you are saying. This return value should indicate to
> > user space that a Scan Media should be issued. Issuing the Scan Media
> > to the device does lead the device to rebuild it's list, as you say.
> > Also, when we get the Scan Media results, the device is able to report
> > partial results and tell the host to collect the error records, and
> > then restart the scan, get results again, and on and on until the scan
> > is complete.
> > 
> > Perhaps a clarification - there is not a logical pairing of Scan Media
> > followed by Get Poison List.  Scan Media followed by Get Scan Media
> > Results is the logical pairing. Get Poison List is getting a snapshot
> > of the poison list at a point in time. The device adds DPAs to the list
> > when the device detects poison, some devices run their own backround
> > scans and add to the poison list, and then there are the user initiated
> > actions (Scan Media and Poison Inject) that can affect the list.
> > 
> > > I'm just wondering if this loop should continue to clear the list and then let
> > > something else do the scan media request?
> > 
> > It's not like the _MORE status where the device is telling the host to
> > come back and gather more. I think the action of failing, and letting
> > user initiated a Scan Media is correct course here.
> 
> Fair enough.  But I guess I'm still confused by the spec.  The way I read it
> yesterday (and I could be wrong) was that the OS was supposed to read the
> entries to clear the list?  Is that not true?

I think - not true.

Get_Poison_List has no effect on the contents of the list itself.
Even with its MORE flag, it is not clearing any poison, it's just
telling the host that it had more records than could fit in one
device payload so they will have to delivered to the host in multiple
requests. I'd expect issuing multiple Get Poison List requests would
get same results.  (unless of course the media was going bad quickly

Maybe you are conflating w other cmds: Scan Media & Clear Poison

> 
> I the device will clear the list internally when Scan Media is run?

Spec says device 'rebuids' the list. I might guess that's a clear and
start anew, but not the hosts business. As a host, we wait for Scan
Media to complete before issuing Get Scan Media Results or Get Poison
List.
> 
> At this point I'm just trying to understand not necessarily objecting to the
> patch.

NP.  The questions are helpful!

> 
> Ira
> 
> > 
> > So, this response got kind of long winded. As you can see, especially
> > if one looks in the spec as I know you are, there are additional
> > commands that need to be implemented to complete the ARS feature set.
> > And, of course, we'll offer user space tooling (NDCTL and libcxl).
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ