[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <0D3A81E2-C99A-491D-AB66-FC6005E38667@gmail.com>
Date: Tue, 31 Jan 2023 07:19:58 -0800
From: Shesha Sreenivasamurthy <sheshas@...il.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org
Subject: Re: Hot ADD using CXL1.1 host
On Mon, Jan 30, 2023 at 2:00 PM Dan Williams <dan.j.williams@...el.com> wrote:
>
> Hi Shesha, Linux email expectations are to not top post, i.e. respond
> inline, like below:
>
> Shesha Sreenivasamurthy wrote:
>> The re-configuration does not reset the device. It does re-program the PCIe
>> DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg 258), register
>> (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).
>> “So you need to dynamically recreate the region, especially if your step 10
>> above resets the device.”
>> Do you mean the DAX region ?
>
> No, I mean the CXL region.
>
>> If so, I can if the system stays up. After a few seconds the system
>> crashes. Can the crash be because of a mismatch between DVSEC
>> information with what kernel was informed by BIOS during boot (Some
>> ACPI tables ?)
>
> My concern is that the platform memory decode configuration is not
> prepared for the CXL device to claim more than what was originally
> programmed in the CXL DVSEC range registers. One of the platform
> firmware updates for CXL 2.0 was the creation of the CFMWS (CXL Fixed
> Memory Window Structure) in the ACPI CEDT (CXL Early Discovery Table).
> That structure indicates which platform address ranges decode to which
> CXL host bridges. Those windows are defined in platform specific
> registersi (not enumerated to the OS). If the window is only 8GB then
> the endpoint device can not decode more. You would need to reboot to get
> the BIOS to allocate more host address space for CXL.
>
> The expectation for newer platforms is that platform firmware define
> CFMWS such that there is spare capacity in the address map for the OS to
> dynmaically map more CXL.
There seems to be some instability in using DAX. When the system is given all the device memory using efi=nosoftreserve, the stressapptest (https://github.com/stressapptest/stressapptest) runs for an extended period of time. However, when the system is booted without efi=nosoftreserve, and assigned the special purpose memory to system-ram using daxctl, the system crashes after some time (20-30 mins). Is there any known instabilities when using DAX?
Powered by blists - more mailing lists