[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <469ac491-8733-986a-aaae-768ec28ebbec@amd.com>
Date: Wed, 15 Jan 2025 13:55:39 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, Ira Weiny <ira.weiny@...el.com>,
Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Jonathan Corbet <corbet@....net>, Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <kees@...nel.org>, "Gustavo A. R. Silva" <gustavoars@...nel.org>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>, linux-cxl@...r.kernel.org,
linux-doc@...r.kernel.org, nvdimm@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org,
Li Ming <ming.li@...omail.com>
Subject: Re: [PATCH v8 02/21] cxl/mem: Read dynamic capacity configuration
from the device
On 1/15/25 02:35, Dan Williams wrote:
> Ira Weiny wrote:
>> Devices which optionally support Dynamic Capacity (DC) are configured
>> via mailbox commands. CXL 3.1 requires the host to issue the Get DC
>> Configuration command in order to properly configure DCDs. Without the
>> Get DC Configuration command DCD can't be supported.
>>
>> Implement the DC mailbox commands as specified in CXL 3.1 section
>> 8.2.9.9.9 (opcodes 48XXh) to read and store the DCD configuration
>> information. Disable DCD if DCD is not supported. Leverage the Get DC
>> Configuration command supported bit to indicate if DCD is supported.
>>
>> Linux has no use for the trailing fields of the Get Dynamic Capacity
>> Configuration Output Payload (Total number of supported extents, number
>> of available extents, total number of supported tags, and number of
>> available tags). Avoid defining those fields to use the more useful
>> dynamic C array.
>>
>> Based on an original patch by Navneet Singh.
>>
>> Cc: Li Ming <ming.li@...omail.com>
>> Cc: Kees Cook <kees@...nel.org>
>> Cc: Gustavo A. R. Silva <gustavoars@...nel.org>
>> Cc: linux-hardening@...r.kernel.org
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
>>
>> ---
>> Changes:
>> [iweiny: fix EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify)]
>> [iweiny: limit variable scope in cxl_dev_dynamic_capacity_identify]
>> ---
>> drivers/cxl/core/mbox.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++-
>> drivers/cxl/cxlmem.h | 64 ++++++++++++++++++-
>> drivers/cxl/pci.c | 4 ++
>> 3 files changed, 232 insertions(+), 2 deletions(-)
>>
> [snipping the C code to do a data structure review first]
>
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index e8907c403edbd83c8a36b8d013c6bc3391207ee6..05a0718aea73b3b2a02c608bae198eac7c462523 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -403,6 +403,7 @@ enum cxl_devtype {
>> CXL_DEVTYPE_CLASSMEM,
>> };
>>
>> +#define CXL_MAX_DC_REGION 8
> Please no, lets not sign up to have the "which cxl 'region' concept are
> you referring to?" debate in perpetuity. "DPA partition", "DPA
> resource", "DPA capacity" anything but "region".
>
>
This next comment is not my main point to discuss in this email
(resources initialization is), but I seize it for giving my view in this
one.
Dan, you say later we (Linux) are not obligated to use "questionable
naming decisions of specifications", but we should not confuse people
either.
Maybe CXL_MAX_DC_HW_REGION would help here, for differentiating it from
the kernel software cxl region construct. I think we will need a CXL
kernel dictionary sooner or later ...
>> /**
>> * struct cxl_dpa_perf - DPA performance property entry
>> * @dpa_range: range for DPA address
>> @@ -434,6 +435,8 @@ struct cxl_dpa_perf {
>> * @dpa_res: Overall DPA resource tree for the device
>> * @pmem_res: Active Persistent memory capacity configuration
>> * @ram_res: Active Volatile memory capacity configuration
>> + * @dc_res: Active Dynamic Capacity memory configuration for each possible
>> + * region
>> * @serial: PCIe Device Serial Number
>> * @type: Generic Memory Class device or Vendor Specific Memory device
>> * @cxl_mbox: CXL mailbox context
>> @@ -449,11 +452,23 @@ struct cxl_dev_state {
>> struct resource dpa_res;
>> struct resource pmem_res;
>> struct resource ram_res;
>> + struct resource dc_res[CXL_MAX_DC_REGION];
> This is throwing off cargo-cult alarms. The named pmem_res and ram_res
> served us well up until the point where DPA partitions grew past 2 types
> at well defined locations. I like the array of resources idea, but that
> begs the question why not put all partition information into an array?
>
> This would also head off complications later on in this series where the
> DPA capacity reservation and allocation flows have "dc" sidecars bolted
> on rather than general semantics like "allocating from partition index N
> means that all partitions indices less than N need to be skipped and
> marked reserved".
I guess this is likely how you want to change the type2 resource
initialization issue and where I'm afraid these two patchsets are going
to collide at.
If that is the case, both are going to miss the next kernel cycle since
it means major changes, but let's discuss it without further delays for
the sake of implementing the accepted changes as soon as possible, and I
guess with a close sync between Ira and I.
BTW, in the case of the Type2, there are more things to discuss which I
do there.
Thank you
>> u64 serial;
>> enum cxl_devtype type;
>> struct cxl_mailbox cxl_mbox;
>> };
>>
>> +#define CXL_DC_REGION_STRLEN 8
>> +struct cxl_dc_region_info {
>> + u64 base;
>> + u64 decode_len;
>> + u64 len;
> Duplicating partition information in multiple places, like
> mds->dc_region[X].base and cxlds->dc_res[X].start, feels like an
> RFC-quality decision for expediency that needs to reconciled on the way
> to upstream.
>
>> + u64 blk_size;
>> + u32 dsmad_handle;
>> + u8 flags;
>> + u8 name[CXL_DC_REGION_STRLEN];
> No, lets not entertain:
>
> printk("%s\n", mds->dc_region[index].name);
>
> ...when:
>
> printk("dc%d\n", index);
>
> ...will do.
>
>> +};
>> +
>> static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
>> {
>> return dev_get_drvdata(cxl_mbox->host);
>> @@ -473,7 +488,9 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
>> * @dcd_cmds: List of DCD commands implemented by memory device
>> * @enabled_cmds: Hardware commands found enabled in CEL.
>> * @exclusive_cmds: Commands that are kernel-internal only
>> - * @total_bytes: sum of all possible capacities
>> + * @total_bytes: length of all possible capacities
>> + * @static_bytes: length of possible static RAM and PMEM partitions
>> + * @dynamic_bytes: length of possible DC partitions (DC Regions)
>> * @volatile_only_bytes: hard volatile capacity
>> * @persistent_only_bytes: hard persistent capacity
> I have regrets that cxl_memdev_state permanently carries runtime storage
> for init time variables, lets not continue down that path with DCD
> enabling.
>
>> * @partition_align_bytes: alignment size for partition-able capacity
>> @@ -483,6 +500,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
>> * @next_persistent_bytes: persistent capacity change pending device reset
>> * @ram_perf: performance data entry matched to RAM partition
>> * @pmem_perf: performance data entry matched to PMEM partition
>> + * @nr_dc_region: number of DC regions implemented in the memory device
>> + * @dc_region: array containing info about the DC regions
>> * @event: event log driver state
>> * @poison: poison driver state info
>> * @security: security driver state info
>> @@ -499,6 +518,8 @@ struct cxl_memdev_state {
>> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
>> DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
>> u64 total_bytes;
>> + u64 static_bytes;
>> + u64 dynamic_bytes;
>> u64 volatile_only_bytes;
>> u64 persistent_only_bytes;
>> u64 partition_align_bytes;
>> @@ -510,6 +531,9 @@ struct cxl_memdev_state {
>> struct cxl_dpa_perf ram_perf;
>> struct cxl_dpa_perf pmem_perf;
>>
>> + u8 nr_dc_region;
>> + struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION];
> DPA capacity is a generic CXL.mem concern and partition information is
> contained cxl_dev_state. Lets find a way to not need partially redundant
> data structures across in cxl_memdev_state and cxl_dev_state.
>
> DCD introduces the concept of "decode size vs usable capacity" into the
> partition information, but I see no reason to conceptually tie that to
> only DCD. Fabio's memory hole patches show that there is already a
> memory-hole concept in the CXL arena. DCD is just saying "be prepared for
> the concept of DPA partitions with memory holes at the end".
>
>> +
>> struct cxl_event_state event;
>> struct cxl_poison_state poison;
>> struct cxl_security_state security;
>> @@ -708,6 +732,32 @@ struct cxl_mbox_set_partition_info {
>>
>> #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
>>
>> +/* See CXL 3.1 Table 8-163 get dynamic capacity config Input Payload */
>> +struct cxl_mbox_get_dc_config_in {
>> + u8 region_count;
>> + u8 start_region_index;
>> +} __packed;
>> +
>> +/* See CXL 3.1 Table 8-164 get dynamic capacity config Output Payload */
>> +struct cxl_mbox_get_dc_config_out {
>> + u8 avail_region_count;
>> + u8 regions_returned;
>> + u8 rsvd[6];
>> + /* See CXL 3.1 Table 8-165 */
>> + struct cxl_dc_region_config {
>> + __le64 region_base;
>> + __le64 region_decode_length;
>> + __le64 region_length;
>> + __le64 region_block_size;
>> + __le32 region_dsmad_handle;
>> + u8 flags;
>> + u8 rsvd[3];
>> + } __packed region[] __counted_by(regions_returned);
> Yes, the spec unfortunately uses "region" for this partition info
> payload. This would be a good place to say "CXL spec calls this 'region'
> but Linux calls it 'partition' not to be confused with the Linux 'struct
> cxl_region' or all the other usages of 'region' in the specification".
>
> Linux is not obligated to follow the questionable naming decisions of
> specifications.
>
>> + /* Trailing fields unused */
>> +} __packed;
>> +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0)
>> +#define CXL_DCD_BLOCK_LINE_SIZE 0x40
>> +
>> /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */
>> struct cxl_mbox_set_timestamp_in {
>> __le64 timestamp;
>> @@ -831,6 +881,7 @@ enum {
>> int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
>> struct cxl_mbox_cmd *cmd);
>> int cxl_dev_state_identify(struct cxl_memdev_state *mds);
>> +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds);
>> int cxl_await_media_ready(struct cxl_dev_state *cxlds);
>> int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
>> int cxl_mem_create_range_info(struct cxl_memdev_state *mds);
>> @@ -844,6 +895,17 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> enum cxl_event_log_type type,
>> enum cxl_event_type event_type,
>> const uuid_t *uuid, union cxl_event *evt);
>> +
>> +static inline bool cxl_dcd_supported(struct cxl_memdev_state *mds)
>> +{
>> + return test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds);
>> +}
>> +
>> +static inline void cxl_disable_dcd(struct cxl_memdev_state *mds)
>> +{
>> + clear_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds);
>> +}
> This hunk is out of place, and per the last patch, I think it can just be
> a flag that does not need a helper.
>
Powered by blists - more mailing lists