[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6711842d88fa_2cee2946a@iweiny-mobl.notmuch>
Date: Thu, 17 Oct 2024 16:39:57 -0500
From: Ira Weiny <ira.weiny@...el.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>, <ira.weiny@...el.com>
CC: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>, "Navneet
Singh" <navneet.singh@...el.com>, Jonathan Corbet <corbet@....net>, "Andrew
Morton" <akpm@...ux-foundation.org>, Dan Williams <dan.j.williams@...el.com>,
Davidlohr Bueso <dave@...olabs.net>, Alison Schofield
<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>,
<linux-btrfs@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <nvdimm@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 21/28] cxl/extent: Process DCD events and realize
region extents
Jonathan Cameron wrote:
> On Mon, 07 Oct 2024 18:16:27 -0500
> ira.weiny@...el.com wrote:
>
> > From: Navneet Singh <navneet.singh@...el.com>
> >
> > A dynamic capacity device (DCD) sends events to signal the host for
> > changes in the availability of Dynamic Capacity (DC) memory. These
> > events contain extents describing a DPA range and meta data for memory
> > to be added or removed. Events may be sent from the device at any time.
> >
> > Three types of events can be signaled, Add, Release, and Force Release.
> >
> > On add, the host may accept or reject the memory being offered. If no
> > region exists, or the extent is invalid, the extent should be rejected.
> > Add extent events may be grouped by a 'more' bit which indicates those
> > extents should be processed as a group.
> >
> > On remove, the host can delay the response until the host is safely not
> > using the memory. If no region exists the release can be sent
> > immediately. The host may also release extents (or partial extents) at
> > any time. Thus the 'more' bit grouping of release events is of less
> > value and can be ignored in favor of sending multiple release capacity
> > responses for groups of release events.
>
> True today - I think that would be an error for shared extents
> though as they need to be released in one go. We can deal with
> that when it matters.
>
>
> Mind you patch seems to try to handle more bit anyway, so maybe just
> remove that discussion from this description?
It only handles more bit response on ADD because on RELEASE the count is always
1.
+ if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1))
+ dev_dbg(dev, "Failed to release [range 0x%016llx-0x%016llx]\n",
+ range->start, range->end);
For shared; a flag will need to be added to the extents and additional logic to
group these extents for checking use etc.
I agree, we need to handle that later on and get this basic support in. For
now I think my comments are correct WRT the sending of release responses.
> >
> > Simplify extent tracking with the following restrictions.
> >
> > 1) Flag for removal any extent which overlaps a requested
> > release range.
> > 2) Refuse the offer of extents which overlap already accepted
> > memory ranges.
> > 3) Accept again a range which has already been accepted by the
> > host. Eating duplicates serves three purposes. First, this
> > simplifies the code if the device should get out of sync with
> > the host.
>
> Maybe scream about this a little. AFAIK that happening is a device
> bug.
Agreed but because of the 2nd purpose this is difficult to scream about because
this situation can come up in normal operation. Here is the scenario:
1) Device has 2 DCD partitions active, A and B
2) Host crashes
3) Region X is created on A
4) Region Y is created on B
5) Region Y scans for extents
6) Region X surfaces a new extent while Y is scanning
7) Gen number changes due to new extent in X
8) Region Y rescans for existing extents and sees duplicates.
These duplicates need to be ignored without signaling an error.
>
> > And it should be safe to acknowledge the extent
> > again. Second, this simplifies the code to process existing
> > extents if the extent list should change while the extent
> > list is being read.
This is the 'normal' case.
> > Third, duplicates for a given region
> > which are seen during a race between the hardware surfacing
> > an extent and the cxl dax driver scanning for existing
> > extents will be ignored.
>
> This last one is a good justification.
I think the second justification is actually better than this one. Regardless
this makes everything ok and should work.
>
> >
> > NOTE: Processing existing extents is done in a later patch.
> >
> > Management of the region extent devices must be synchronized with
> > potential uses of the memory within the DAX layer. Create region extent
> > devices as children of the cxl_dax_region device such that the DAX
> > region driver can co-drive them and synchronize with the DAX layer.
> > Synchronization and management is handled in a subsequent patch.
> >
> > Tag support within the DAX layer is not yet supported. To maintain
> > compatibility legacy DAX/region processing only tags with a value of 0
> > are allowed. This defines existing DAX devices as having a 0 tag which
> > makes the most logical sense as a default.
> >
> > Process DCD events and create region devices.
> >
> > Signed-off-by: Navneet Singh <navneet.singh@...el.com>
> > Co-developed-by: Ira Weiny <ira.weiny@...el.com>
> > Signed-off-by: Ira Weiny <ira.weiny@...el.com>
> >
> A couple of minor comments from me.
I do appreciate the review.
[snip]
> >
> > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> > + struct xarray *extent_array, int cnt)
> > +{
> > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> > + struct cxl_mbox_dc_response *p;
> > + struct cxl_mbox_cmd mbox_cmd;
> > + struct cxl_extent *extent;
> > + unsigned long index;
> > + u32 pl_index;
> > + int rc;
> > +
> > + size_t pl_size = struct_size(p, extent_list, cnt);
> > + u32 max_extents = cnt;
> > +
> > + /* May have to use more bit on response. */
>
> I thought you argued in the patch description that it didn't matter if you
> didn't set it?
Only on RELEASE responses. ADD responses might need it depending on the
payload size and number of extents being added.
Sorry that was not clear.
>
> > + if (pl_size > cxl_mbox->payload_size) {
> > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> > + sizeof(struct updated_extent_list);
> > + pl_size = struct_size(p, extent_list, max_extents);
> > + }
> > +
> > + struct cxl_mbox_dc_response *response __free(kfree) =
> > + kzalloc(pl_size, GFP_KERNEL);
> > + if (!response)
> > + return -ENOMEM;
> > +
> > + pl_index = 0;
> > + xa_for_each(extent_array, index, extent) {
> > +
> > + response->extent_list[pl_index].dpa_start = extent->start_dpa;
> > + response->extent_list[pl_index].length = extent->length;
> > + pl_index++;
> > + response->extent_list_size = cpu_to_le32(pl_index);
> > +
> > + if (pl_index == max_extents) {
> > + mbox_cmd = (struct cxl_mbox_cmd) {
> > + .opcode = opcode,
> > + .size_in = struct_size(response, extent_list,
> > + pl_index),
> > + .payload_in = response,
> > + };
> > +
> > + response->flags = 0;
> > + if (pl_index < cnt)
> > + response->flags &= CXL_DCD_EVENT_MORE;
> Covered in other branch of thread.
Yep.
[snip]
>
> >
> > +/* See CXL 3.0 8.2.9.2.1.5 */
>
> Maybe update to 3.1? Otherwise patch reviewer needs to open two
> spec versions! In 3.1 it is 8.2.9.2.1.6
Yep missed this one. Thanks,
Ira
Powered by blists - more mailing lists