[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <66bf5f78a2330_22328529480@iweiny-mobl.notmuch>
Date: Fri, 16 Aug 2024 09:17:28 -0500
From: Ira Weiny <ira.weiny@...el.com>
To: Ira Weiny <ira.weiny@...el.com>, Dave Jiang <dave.jiang@...el.com>, Fan Ni
<fan.ni@...sung.com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>, Navneet
Singh <navneet.singh@...el.com>
CC: Dan Williams <dan.j.williams@...el.com>, Davidlohr Bueso
<dave@...olabs.net>, Alison Schofield <alison.schofield@...el.com>, "Vishal
Verma" <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
<linux-btrfs@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Chris Mason <clm@...com>, Josef Bacik
<josef@...icpanda.com>, David Sterba <dsterba@...e.com>, Johannes Thumshirn
<johannes.thumshirn@....com>, Petr Mladek <pmladek@...e.com>, Steven Rostedt
<rostedt@...dmis.org>, Jonathan Corbet <corbet@....net>, "open
list:DOCUMENTATION" <linux-doc@...r.kernel.org>, "Li, Ming"
<ming4.li@...el.com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v2 00/25] DCD: Add support for Dynamic Capacity Devices
(DCD)
Please ignore this series __and__ the RESEND.
The series did not get sent properly. Something went wrong with my smtp
server in the middle.
[PATCH v2 22/25] cxl/region: Read existing extents on region creation
CRITICAL: Error running /usr/bin/msmtp -i: msmtp: cannot locate host smtpauth.intel.com: No address associated with hostname
msmtp: could not send mail (account default from /home/iweiny/.msmtprc)
Then I used b4 --resend v2. But glossed over the fact that it was going
to do something very bad and send a very old version.
https://lore.kernel.org/all/20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com/
So please ignore that too. :-(
At this point I'm going to send v3.
<fingers crossed>
Ira
Ira Weiny wrote:
> A git tree of this series can be found here:
>
> https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-08-15
>
> This series requires the CXL memory notifier lock change:
>
> https://lore.kernel.org/all/20240814-fix-notifiers-v2-1-6bab38192c7c@intel.com/
>
> Background
> ==========
>
> A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
> device that allows memory capacity within a region to change
> dynamically without the need for resetting the device, reconfiguring
> HDM decoders, or reconfiguring software DAX regions.
>
> One of the biggest use cases for Dynamic Capacity is to allow hosts to
> share memory dynamically within a data center without increasing the
> per-host attached memory.
>
> The general flow for the addition or removal of memory is to have an
> orchestrator coordinate the use of the memory. Generally there are 5
> actors in such a system, the Orchestrator, Fabric Manager, the Logical
> device, the Host Kernel, and a Host User.
>
> Typical work flows are shown below.
>
> Orchestrator FM Device Host Kernel Host User
>
> | | | | |
> |-------------- Create region ----------------------->|
> | | | | |
> | | | |<-- Create ---|
> | | | | Region |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create --->|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> | | | | |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create ----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> |<------------- Signal done --------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | |<- Create ----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> |-- Remove -->|- Release->|- Release ->| | |
> | Capacity | Extent | Extent | | |
> | | | | | |
> | | | (Release Ignored) | |
> | | | | | |
> | | | |<- Release ---| <-+
> | | | | DAX dev |
> |<------------- Signal done --------------------------|
> | | | | |
> | |- Release->|- Release ->| |
> | | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | |<- Destroy ---|
> | | | | Region |
> | | | | |
>
> Previous versions of this series[0] resulted in architectural comments
> as well as confusion on the architecture based on the organization of
> patch series itself.
>
> This version has reordered the patches to clarify the architecture.
> It also streamlines extent handling more.
>
> The series still requires the creation of regions and DAX devices to be
> synchronized with the Orchestrator and Fabric Manager. The host kernel
> will reject an add extent event if the region is not created yet. It
> will also ignore a release if the DAX device is created and referencing
> an extent.
>
> These synchronizations are not anticipated to be an issue with real
> applications.
>
> In order to allow for capacity to be added and removed a new concept of
> a sparse DAX region is introduced. A sparse DAX region may have 0 or
> more bytes of available space. The total space depends on the number
> and size of the extents which have been added.
>
> Initially it is anticipated that users of the memory will carefully
> coordinate the surfacing of additional capacity with the creation of DAX
> devices which use that capacity. Therefore, the allocation of the
> memory to DAX devices does not allow for specific associations between
> DAX device and extent. This keeps allocations very similar to existing
> DAX region behavior.
>
> Great care was taken to keep the extent tracking simple. Some xarray's
> needed to be added but extra software objects were kept to a minimum.
>
> Region extents continue to be tracked as sub-devices of the DAX region.
> This ensures that region destruction cleans up all extent allocations
> properly.
>
> Due to these major changes all reviews were removed from the larger
> patches. A few of the straight forward patches have kept the tags.
>
> In summary the major functionality of this series includes:
>
> - Getting the dynamic capacity (DC) configuration information from cxl
> devices
>
> - Configuring the DC partitions reported by hardware
>
> - Enhancing the CXL and DAX regions for dynamic capacity support
> a. Maintain a logical separation between hardware extents and
> software managed region extents. This provides an
> abstraction between the layers and should allow for
> interleaving in the future
>
> - Get hardware extent lists for endpoint decoders upon
> region creation.
>
> - Adjust extent/region memory available on the following events.
> a. Add capacity Events
> b. Release capacity events
>
> - Host response for add capacity
> a. do not accept the extent if:
> If the region does not exist
> or an error occurs realizing the extent
> b. If the region does exist
> realize a DAX region extent with 1:1 mapping (no
> interleave yet)
> c. Support the more bit by processing a list of extents marked
> with the more bit together before setting up a response.
>
> - Host response for remove capacity
> a. If no DAX device references the extent; release the extent
> b. If a reference does exist, ignore the request.
> (Require FM to issue release again.)
>
> - Modify DAX device creation/resize to account for extents within a
> sparse DAX region
>
> - Trace Dynamic Capacity events for debugging
>
> - Add cxl-test infrastructure to allow for faster unit testing
> (See new ndctl branch for cxl-dcd.sh test[1])
>
> Fan Ni's upstream of Qemu DCD was used for testing.
>
> Remaining work:
>
> 1) Integrate the QoS work from Dave Jiang
> 2) Interleave support
>
> Possible additional work depending on requirements:
>
> 1) Allow mapping to specific extents (perhaps based on
> label/tag)
> 2) Release extents when DAX devices are released if a release
> was previously seen from the device
> 3) Accept a new extent which extends (but overlaps) an existing
> extent(s)
> 4) Rework DAX device interfaces, memfd has been explored a bit
>
> [0] v1: https://lore.kernel.org/all/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com/
> [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-08-15
>
> ---
> Major changes:
> - Jonathan: support the more bit
> - djbw: Allow more than 1 region per DC partition
> - All: Address the many comments on the series.
> - iweiny: rebase
> - iweiny: Rework the series to make it easier to review and understand
> the flow
> - Link to v1: https://lore.kernel.org/r/20240324-dcd-type2-upstream-v1-0-b7b00d623625@intel.com
>
> ---
> Ira Weiny (13):
> range: Add range_overlaps()
> printk: Add print format (%par) for struct range
> dax: Document dax dev range tuple
> cxl/pci: Delay event buffer allocation
> cxl/region: Refactor common create region code
> cxl/events: Split event msgnum configuration from irq setup
> cxl/pci: Factor out interrupt policy check
> cxl/core: Return endpoint decoder information from region search
> dax/bus: Factor out dev dax resize logic
> dax/region: Create resources on sparse DAX regions
> cxl/region: Read existing extents on region creation
> tools/testing/cxl: Make event logs dynamic
> tools/testing/cxl: Add DC Regions to mock mem data
>
> Navneet Singh (12):
> cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> cxl/mem: Read dynamic capacity configuration from the device
> cxl/core: Separate region mode from decoder mode
> cxl/region: Add dynamic capacity decoder and region modes
> cxl/hdm: Add dynamic capacity size support to endpoint decoders
> cxl/port: Add endpoint decoder DC mode support to sysfs
> cxl/mem: Expose DCD partition capabilities in sysfs
> cxl/region: Add sparse DAX region support
> cxl/mem: Configure dynamic capacity interrupts
> cxl/extent: Process DCD events and realize region extents
> cxl/region/extent: Expose region extent information in sysfs
> cxl/mem: Trace Dynamic capacity Event Record
>
> Documentation/ABI/testing/sysfs-bus-cxl | 68 ++-
> Documentation/core-api/printk-formats.rst | 14 +
> drivers/cxl/core/Makefile | 2 +-
> drivers/cxl/core/core.h | 33 +-
> drivers/cxl/core/extent.c | 467 ++++++++++++++
> drivers/cxl/core/hdm.c | 206 ++++++-
> drivers/cxl/core/mbox.c | 578 +++++++++++++++++-
> drivers/cxl/core/memdev.c | 101 ++-
> drivers/cxl/core/port.c | 13 +-
> drivers/cxl/core/region.c | 173 ++++--
> drivers/cxl/core/trace.h | 65 ++
> drivers/cxl/cxl.h | 122 +++-
> drivers/cxl/cxlmem.h | 128 +++-
> drivers/cxl/pci.c | 123 +++-
> drivers/dax/bus.c | 352 +++++++++--
> drivers/dax/bus.h | 4 +-
> drivers/dax/cxl.c | 73 ++-
> drivers/dax/dax-private.h | 39 +-
> drivers/dax/hmem/hmem.c | 2 +-
> drivers/dax/pmem.c | 2 +-
> fs/btrfs/ordered-data.c | 10 +-
> include/linux/cxl-event.h | 32 +
> include/linux/range.h | 7 +
> lib/vsprintf.c | 37 ++
> tools/testing/cxl/Kbuild | 3 +-
> tools/testing/cxl/test/mem.c | 981 ++++++++++++++++++++++++++----
> 26 files changed, 3327 insertions(+), 308 deletions(-)
> ---
> base-commit: 3cef9316df4cda21b5bf25e4230221b02050dfa1
> change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
>
> Best regards,
> --
> Ira Weiny <ira.weiny@...el.com>
>
Powered by blists - more mailing lists