[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39697f68-9dc8-7692-7210-b75cce32c6ce@amd.com>
Date: Mon, 31 Jul 2023 15:09:08 -0500
From: "Limonciello, Mario" <mario.limonciello@....com>
To: August Wikerfors <git@...ustwikerfors.se>,
Keith Busch <kbusch@...nel.org>
Cc: axboe@...com, hch@....de, sagi@...mberg.me,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
nilskruse97@...il.com, David.Chang@....com
Subject: Re: [PATCH] nvme: Don't fail to resume if NSIDs change
On 7/31/2023 2:54 PM, August Wikerfors wrote:
> On 2023-07-31 21:10, Keith Busch wrote:
>> On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote:
>>> Samsung PM9B1 has problems after resume because NSID has changed.
>>> This has been reported in the past on OEM varities of PM9B1 parts
>>> and fixed by firmware updates on 'some' of those parts.
>>>
>>> However this same issue also happens on 'retail' PM9B1 parts which
>>> Samsung has not released firmware updates for.
>>>
>>> As the check has been relaxed at startup for multiple disks with
>>> duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject
>>> probe due to duplicate IDs for single-ported PCIe devices") also
>>> relax the check that runs on resume for NSIDs and mark them bogus
>>> if this occurs on resume.
>>
>> How could the driver tell the difference between the device needing a
>> quirk compared to a rapid delete-create-attach namespace sequence?
>> Proceeding with the namespace now may get dirty writes intended for the
>> previous namespace, corrupting the new one.
>>
>> The commit you mentioned tries to constrain allowing duplication where
>> we can reasonably assume the quirk is needed. If we need to do similiar
>> for this condition, one possible constraint might be that the device
>> doesn't report OACS bit 3 (Namespace Management).
>
> It looks like that would work for the PM9B1:
>> $ sudo nvme id-ctrl -H /dev/nvme0
>> [...] > oacs : 0x17
>> [10:10] : 0 Lockdown Command and Feature Not Supported
>> [9:9] : 0 Get LBA Status Capability Not Supported
>> [8:8] : 0 Doorbell Buffer Config Not Supported
>> [7:7] : 0 Virtualization Management Not Supported
>> [6:6] : 0 NVMe-MI Send and Receive Not Supported
>> [5:5] : 0 Directives Not Supported
>> [4:4] : 0x1 Device Self-test Supported
>> [3:3] : 0 NS Management and Attachment Not Supported
>> [2:2] : 0x1 FW Commit and Download Supported
>> [1:1] : 0x1 Format NVM Supported
>> [0:0] : 0x1 Security Send and Receive Supported
>
> Regards,
> August Wikerfors
So is it reasonable to just add a check for
ctrl->oacs & NVME_CTRL_OACS_NS_MNGT_SUPP
In the same error handling path as this patch?
Powered by blists - more mailing lists