lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7fc0b153-9eea-af2c-cd42-c66a2d4087bc@amd.com>
Date: Wed, 15 Jan 2025 16:01:54 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, alejandro.lucero-palau@....com,
 linux-cxl@...r.kernel.org, netdev@...r.kernel.org, martin.habets@...inx.com,
 edward.cree@....com, davem@...emloft.net, kuba@...nel.org,
 pabeni@...hat.com, edumazet@...gle.com, dave.jiang@...el.com
Subject: Re: [PATCH v8 01/27] cxl: add type2 device basic support


On 1/14/25 22:52, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 1/8/25 14:32, Alejandro Lucero Palau wrote:
>>> On 1/8/25 01:33, Dan Williams wrote:
>>>> Dan Williams wrote:
>>>>> alejandro.lucero-palau@ wrote:
>>>>>> From: Alejandro Lucero <alucerop@....com>
>>>>>>
>>>>>> Differentiate CXL memory expanders (type 3) from CXL device
>>>>>> accelerators
>>>>>> (type 2) with a new function for initializing cxl_dev_state.
>>>>>>
>>>>>> Create accessors to cxl_dev_state to be used by accel drivers.
>>>>>>
>>>>>> Based on previous work by Dan Williams [1]
>>>>>>
>>>>>> Link: [1]
>>>>>> https://lore.kernel.org/linux-cxl/168592160379.1948938.12863272903570476312.stgit@dwillia2-xfh.jf.intel.com/
>>>>>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>>>>>> Co-developed-by: Dan Williams <dan.j.williams@...el.com>
>>>>>> Reviewed-by: Dave Jiang <dave.jiang@...el.com>
>>>>>> Reviewed-by: Fan Ni <fan.ni@...sung.com>
>>>>> This patch causes
>>>> Whoops, forgot to complete this thought. Someting in this series causes:
>>>>
>>>> depmod: ERROR: Cycle detected: ecdh_generic
>>>> depmod: ERROR: Cycle detected: tpm
>>>> depmod: ERROR: Cycle detected: cxl_mock -> cxl_core -> cxl_mock
>>>> depmod: ERROR: Cycle detected: encrypted_keys
>>>> depmod: ERROR: Found 2 modules in dependency cycles!
>>>>
>>>> I think the non CXL ones are false likely triggered by the CXL causing
>>>> depmod to exit early.
>>>>
>>>> Given cxl-test is unfamiliar territory to many submitters I always offer
>>>> to fix up the breakage. I came up with the below incremental patch to
>>>> fold in that also addresses my other feedback.
>>>>
>>>> Now the depmod error is something Alison saw too, and while I can also
>>>> see it on patch1 if I do:
>>>>
>>>> - apply whole series
>>>> - build => see the error
>>>> - rollback patch1
>>>> - build => see the error
>>>>
>>>> ...a subsequent build the error goes away, so I think that transient
>>>> behavior is a quirk of how cxl-test is built, but some later patch in
>>>> that series makes the failure permanent.
>>>>
>>>> In any event I figured that out after creating the below fixup and
>>>> realizing that it does not fix the cxl-test build issue:
>>>
>>> Ok. but it is a good way of showing what you had in your mind about
>>> the suggested changes.
>>>
>>> I'll use it for v10.
>>>
>>> Thanks
>>>
>> Hi Dan,
>>
>>
>> There's a problem with this approach and it is the need of the driver
>> having access to internal cxl structs like cxl_dev_state.
> Apologies for stepping away for a few days, there was a chance to get
> the long pending DAX page reference count series into v6.14 that I
> needed to devote some review cycles.


No worries.


>> Your patch does not cover it but for an accel driver that struct needs
>> to be allocated before using the new cxl_dev_state_init.
> Why does the cxl core need to wrap malloc on behalf of the driver?
>
>> I think we reached an agreement in initial discussions about avoiding
>> this need through an API for accel drivers indirectly doing whatever is
>> needed regarding internal CXL structs. Initially it was stated this
>> being necessary for avoiding drivers doing wrong things but Jonathan
>> pointed out the main concern being changing those internal structs in
>> the future could benefit from this approach. Whatever the reason, that
>> was the assumption.
> I think there is a benefit from a driver being able to do someting like:
>
> struct my_cxl_accelerator_context {
>      ...
>      struct cxl_dev_state cxlds;
>      ...
> };
>
> Even if the rule is that direct consumption of 'struct cxl_dev_state'
> outside of the cxl core is unwanted.
>
> C does not make this easy, so it is either make the definition of
> 'struct cxl_dev_state' public to CXL accelerator drivers so that they
> know the size, or add an allocation API that takes in the extra size
> that accelerator needs to allocate the core CXL context.
>
> Unless and until we run into a real life problem of accelerator drivers
> misusing 'struct cxl_dev_state' I think I prefer the explicit approach
> of make the data structure embeddable and only require the core to do
> the initialization.


Ok then. I'll make cxl_dev_state public in v10 and do the allocation by 
the driver.


>> I could add a function for accel drivers doing the allocation as with
>> current v9 code, and then using your changes for having common code.
> Let me go look at what you have there, but the design principle of the
> CXL core is a library and enabling (but not requiring) users to have a container_of()
> relationship between the core context and their local context feels the
> most maintainable at this point.
>
>> Also, I completely agree with merging the serial and dvsec
>> initializations through arguments to cxl_dev_state_init, but we need the
>> cxl_set_resource function for accel drivers. The current code for adding
>> resources with memdev is relying on mbox commands, and although we could
>> change that code for supporting accel drivers without an mbox, I would
>> say the function/code added is simple enough for not requiring that
>> effort. Note my goal is for an accel device without an mbox, but we will
>> see devices with one in the future, so I bet for leaving any change
>> there to that moment.
> ...but the way it was adding those resources was just wrong. This also
> collides with some of the changes proposed for DCD partition management.
> I needs a rethink in terms of a data structure and API to describe the
> DPA address map of a CXL.mem capable device to the core and the core
> should not be hard-coding a memory-expander-class device definition to
> that layout.


I think you say is wrong because you did not look at patch 8 where the 
resources are requested based on the parent resource (dpa).


After seeing Ira's DCD patchset regarding the resources allocation 
changes, and your comment there, I think I know that you have in mind. 
But for Type2 the way resources information is obtained changes, at 
least for the case of one without mailbox. In our case we are hardcoding 
the resource definitions, although in the future (and likely other 
drivers) we will use an internal/firmware path for obtaining the 
information. So we have two cases:


1) accel driver with mailbox: an additional API function should allow 
such accel driver to obtain the info or trigger the resource allocation 
based on that command.

2) accel driver without mailbox: a function for allocating the resources 
based on hardcoded or driver-dependent dynamically-obtained data.


The current patchset is supporting the second one, and with the linked 
use case, the first one should be delayed until some accel driver 
requires it.


I can adapt the current API for using the resource array exposed by 
DCD's patches, and useĀ  add_dpa_res function instead of current patchset 
code.


> I am imagining something similar to the way that resource range
> resources are transmitted to platform device registration.


I guess you mean to haveĀ  static/dynamic resource array with __init flag 
for freeing the data after initialization, a CXL core function for 
processing the array and calling add_dpa_res and aware of DCD patch 
needs, and the resources linked to the device and released automatically 
when unloading the driver.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ