lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 Mar 2024 12:24:02 -0700
From: fan <nifan.cxl@...il.com>
To: ira.weiny@...el.com
Cc: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	Navneet Singh <navneet.singh@...el.com>,
	Dan Williams <dan.j.williams@...el.com>,
	Davidlohr Bueso <dave@...olabs.net>,
	Alison Schofield <alison.schofield@...el.com>,
	Vishal Verma <vishal.l.verma@...el.com>,
	linux-btrfs@...r.kernel.org, linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org, Chris Mason <clm@...com>,
	Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>
Subject: Re: [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD)

On Sun, Mar 24, 2024 at 04:18:03PM -0700, ira.weiny@...el.com wrote:
> A git tree of this series can be found here:
> 
> 	https://github.com/weiny2/linux-kernel/tree/dcd-2024-03-24
> 
> Pre-requisite:
> ==============
> 
> 	The locking introduced by Vishal for DAX regions:
> 	https://lore.kernel.org/all/20240124-vv-dax_abi-v7-1-20d16cb8d23d@intel.com/T/#u
> 
> Background
> ==========
> 
> A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
> device that allows the memory capacity to change dynamically, without
> the need for resetting the device, reconfiguring HDM decoders, or
> reconfiguring software DAX regions.
> 
> One of the biggest use cases for Dynamic Capacity is to allow hosts to
> share memory dynamically within a data center without increasing the
> per-host attached memory.
> 
> The general flow for the addition or removal of memory is to have an
> orchestrator coordinate the use of the memory.  Generally there are 5
> actors in such a system, the Orchestrator, Fabric Manager, the Device
> the host sees, the Host Kernel, and a Host User.
> 
> Typical work flows are shown below.
> 
> Orchestrator      FM         Device       Host Kernel    Host User
> 
>     |             |           |            |              |
>     |-------------- Create region ----------------------->|
>     |             |           |            |              |
>     |             |           |            |<-- Create ---|
>     |             |           |            |    Region    |
>     |<------------- Signal done --------------------------|
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->|--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<- Accept -|<- Accept  -|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |<- Create --->|
>     |             |           |            |   DAX dev    |-- Use memory
>     |             |           |            |              |   |
>     |             |           |            |              |   |
>     |             |           |            |<- Release ---| <-+
>     |             |           |            |   DAX dev    |
>     |             |           |            |              |
>     |<------------- Signal done --------------------------|
>     |             |           |            |              |
>     |-- Remove -->|- Release->|- Release ->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<- Release-|<- Release -|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->|--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<- Accept -|<- Accept  -|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |<- Create ----|
>     |             |           |            |   DAX dev    |-- Use memory
>     |             |           |            |              |   |
>     |             |           |            |<- Release ---| <-+
>     |             |           |            |   DAX dev    |
>     |<------------- Signal done --------------------------|
>     |             |           |            |              |
>     |-- Remove -->|- Release->|- Release ->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<- Release-|<- Release -|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->|--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |<- Create ----|
>     |             |           |            |   DAX dev    |-- Use memory
>     |             |           |            |              |   |
>     |-- Remove -->|- Release->|- Release ->|              |   |
>     |  Capacity   |  Extent   |   Extent   |              |   |
>     |             |           |            |              |   |
>     |             |           |     (Release Ignored)     |   |
>     |             |           |            |              |   |
>     |             |           |            |<- Release ---| <-+
>     |             |           |            |   DAX dev    |
>     |<------------- Signal done --------------------------|
>     |             |           |            |              |
>     |             |- Release->|- Release ->|              |
>     |             |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<- Release-|<- Release -|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |<- Destroy ---|
>     |             |           |            |   Region     |
>     |             |           |            |              |
> 
> Previous RFCs of this series[0] resulted in significant architectural
> comments.  Previous versions allowed memory capacity to be accepted by
> the host regardless of the existence of a software region being mapped.
> 
> With this new patch set the order of the create region and DAX device
> creation must be synchronized with the Orchestrator adding/removing
> capacity.  The host kernel will reject an add extent event if the region
> is not created yet.  It will also ignore a release if the DAX device is
> created and referencing an extent.
> 
> Neither of these synchronizations are anticipated to be an issue with
> real applications.
> 
> In order to allow for capacity to be added and removed a new concept of
> a sparse DAX region is introduced.  A sparse DAX region may have 0 or
> more bytes of available space.  The total space depends on the number
> and size of the extents which have been added.
> 
> Initially it is anticipated that users of the memory will carefully
> coordinate the surfacing of additional capacity with the creation of DAX
> devices which use that capacity.  Therefore, the allocation of the
> memory to DAX devices does not allow for specific associations between
> DAX device and extent.  This keeps allocations very similar to existing
> DAX region behavior.
> 
> Great care was taken to greatly simplify extent tracking.  Specifically,
> in comparison to previous versions of the patch set, all extent tracking
> xarrays have been eliminated from the code.  In addition, most of the
> extra software objects and associated referenced counts have been
> eliminated.
> 
> In this version, extents are tracked purely as sub-devices of the
> region.  This ensures that the region destruction cleans up all extent
> allocations properly.  Device managed callbacks are wired to ensure any
> additional data required for DAX device references are handled
> correctly.
> 
> Due to these major changes I'm setting this new series to V1.
> 
> In summary the major functionality of this series includes:
> 
> - Getting the dynamic capacity (DC) configuration information from cxl
>   devices
> 
> - Configuring the DC regions reported by hardware
> 
> - Enhancing the CXL and DAX regions for dynamic capacity support
> 	a. Maintain a logical separation between hardware extents and
> 	   software managed region extents.  This provides an
> 	   abstraction between the layers and should allow for
> 	   interleaving in the future
> 
> - Get hardware extent lists for endpoint decoders upon
>   region creation.
> 
> - Adjust extent/region memory available on the following events.
>         a. Add capacity Events
> 	b. Release capacity events
> 
> - Host response for add capacity
> 	a. do not accept the extent if:
> 		If the region does not exist
> 		or an error occurs realizing the extent
> 	B. If the region does exist
> 		realize a DAX region extent with 1:1 mapping (no
> 		interleave yet)
> 
> - Host response for remove capacity
> 	a. If no DAX devices reference the extent release the extent
> 	b. If a reference does exist, ignore the request.
> 	   (Require FM to issue release again.)
> 
> - Modify DAX device creation/resize to account for extents within a
>   sparse DAX region
> 
> - Trace Dynamic Capacity events for debugging
> 
> - Add cxl-test infrastructure to allow for faster unit testing
>   (See new ndctl branch for cxl-dcd.sh test[1])
> 
> Fan Ni's latest v5 of Qemu DCD was used for testing.[2]
> 
> Remaining work:
> 
> 	1) Integrate the QoS work from Dave Jiang
> 	2) Interleave support
> 
> Possible additional work depending on requirements:
> 
> 	1) Allow mapping to specific extents (perhaps based on
> 	   label/tag)
> 	2) Release extents when DAX devices are released if a release
> 	   was previously seen from the device
> 	3) Accept a new extent which extends (but overlaps) an existing
> 	   extent(s)
> 
> [0] RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
> [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-03-22
> [2] https://lore.kernel.org/all/20240304194331.1586191-1-nifan.cxl@gmail.com/
> 
> ---
> Changes for v1:
> - iweiny: Largely new series
> - iweiny: Remove review tags due to the series being a major rework
> - iweiny: Fix authorship for Navneet patches
> - iweiny: Remove extent xarrays
> - iweiny: Remove kreferences, replace with 1 use count protected under dax_rwsem
> - iweiny: Mark all sysfs entries for the 6.10 June 2024 kernel
> - iweiny: Remove gotos
> - iweiny: Fix 0day issues
> - Jonathan Cameron: address comments
> - Navneet Singh: address comments
> - Dan Williams: address comments
> - Dave Jiang: address comments
> - Fan Ni: address comments
> - Jørgen Hansen: address comments
> - Link to RFC v2: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-0-f740c47e7916@intel.com
> 

Hi Ira,
Have not got a chance to check the code yet, but I noticed one thing
when testing with my DCD emulation code.
Currently, if we do partial release, it seems the whole extent will be
removed. Is it designed intentionally?

Fan

> ---
> Ira Weiny (12):
>       cxl/core: Simplify cxl_dpa_set_mode()
>       cxl/events: Factor out event msgnum configuration
>       cxl/pci: Delay event buffer allocation
>       cxl/pci: Factor out interrupt policy check
>       range: Add range_overlaps()
>       dax/bus: Factor out dev dax resize logic
>       dax: Document dax dev range tuple
>       dax/region: Prevent range mapping allocation on sparse regions
>       dax/region: Support DAX device creation on sparse DAX regions
>       tools/testing/cxl: Make event logs dynamic
>       tools/testing/cxl: Add DC Regions to mock mem data
>       tools/testing/cxl: Add Dynamic Capacity events
> 
> Navneet Singh (14):
>       cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
>       cxl/core: Separate region mode from decoder mode
>       cxl/mem: Read dynamic capacity configuration from the device
>       cxl/region: Add dynamic capacity decoder and region modes
>       cxl/port: Add Dynamic Capacity mode support to endpoint decoders
>       cxl/port: Add dynamic capacity size support to endpoint decoders
>       cxl/mem: Expose device dynamic capacity capabilities
>       cxl/region: Add Dynamic Capacity CXL region support
>       cxl/mem: Configure dynamic capacity interrupts
>       cxl/region: Read existing extents on region creation
>       cxl/extent: Realize extent devices
>       dax/region: Create extent resources on DAX region driver load
>       cxl/mem: Handle DCD add & release capacity events.
>       cxl/mem: Trace Dynamic capacity Event Record
> 
>  Documentation/ABI/testing/sysfs-bus-cxl |  60 ++-
>  drivers/cxl/core/Makefile               |   1 +
>  drivers/cxl/core/core.h                 |  10 +
>  drivers/cxl/core/extent.c               | 145 +++++
>  drivers/cxl/core/hdm.c                  | 254 +++++++--
>  drivers/cxl/core/mbox.c                 | 591 ++++++++++++++++++++-
>  drivers/cxl/core/memdev.c               |  76 +++
>  drivers/cxl/core/port.c                 |  19 +
>  drivers/cxl/core/region.c               | 334 +++++++++++-
>  drivers/cxl/core/trace.h                |  65 +++
>  drivers/cxl/cxl.h                       | 127 ++++-
>  drivers/cxl/cxlmem.h                    | 114 ++++
>  drivers/cxl/mem.c                       |  45 ++
>  drivers/cxl/pci.c                       | 122 +++--
>  drivers/dax/bus.c                       | 353 +++++++++---
>  drivers/dax/bus.h                       |   4 +-
>  drivers/dax/cxl.c                       | 127 ++++-
>  drivers/dax/dax-private.h               |  40 +-
>  drivers/dax/hmem/hmem.c                 |   2 +-
>  drivers/dax/pmem.c                      |   2 +-
>  fs/btrfs/ordered-data.c                 |  10 +-
>  include/linux/cxl-event.h               |  31 ++
>  include/linux/range.h                   |   7 +
>  tools/testing/cxl/Kbuild                |   1 +
>  tools/testing/cxl/test/mem.c            | 914 ++++++++++++++++++++++++++++----
>  25 files changed, 3152 insertions(+), 302 deletions(-)
> ---
> base-commit: dff54316795991e88a453a095a9322718a34034a
> change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
> 
> Best regards,
> -- 
> Ira Weiny <ira.weiny@...el.com>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ