[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DC5UKZ9F6CQZ.2NDFY4S322T2G@cknow.org>
Date: Mon, 18 Aug 2025 22:48:48 +0200
From: "Diederik de Haas" <didi.debian@...ow.org>
To: "Keith Busch" <kbusch@...nel.org>
Cc: <linux-nvme@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
"Diederik de Haas" <didi.debian@...ow.org>
Subject: Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
Hi,
First of all: thanks for taking the time to answer my questions :)
On Mon Aug 18, 2025 at 8:58 PM CEST, Keith Busch wrote:
> On Sat, Aug 16, 2025 at 04:11:00PM +0200, Diederik de Haas wrote:
>> On Sat Aug 16, 2025 at 3:20 PM CEST, Keith Busch wrote:
>>
>> > If you want to see what the driver is reacting to, you can check the
>> > subnqn from command line:
>> >
>> > # nvme id-ctrl /dev/nvme0 | grep subnqn
>> >
>> > It'll probably be all zeros. The field has been required by spec, but
>> > the driver tolerates ones that don't implement it.
>>
>> root@...opi-r5s:~# nvme id-ctrl /dev/nvme0 | grep subnqn
>> subnqn :
>>
>> So it seems to be just empty?
>
> They, it's interpreted as a string. All 0's would be an empty string.
Ah yes, makes sense.
>> The other kernel warning is this:
>>
>> nvme nvme0: using unchecked data buffer
>>
>> The SUBNQN message appears every time, this one appears often, but not
>> always.
>
> That one means you've sent a user space passthrough command to a device
> that doesn't support SGL DMA. Without that, the nvme protocol uses
> implicitly sized DMA that the driver can't be sure is accurate. The user
> could theoretically provide a short buffer that can corrupt memory if
> done by accident, or be used as an attack vector if done by malicious
> software.
>
> This is also not something to worry about unless you run malicious or
> buggy software.
I would be surprised if I was running malicious software, but pretty
much all software has bugs, so that's ofc possible.
(I run Debian Testing or Unstable on pretty much all my devices)
I thought it was a HW problem as the problem seemed to disappear from my
PC when I removed the NVMe drive from it. And when put in my NanoPi R5S
it appeared again on that device.
Seemed, as I just found out it happened on my PC as well (with Samsung
960 PRO 1TB) this boot (but not the 20 boots prior).
Uninstalled the 3 programs from R5S that showed up the most around the
warning message and it's still there.
Would 'dyndbg' be helpful to determine what program is buggy?
>> When researching this/these issues, I discovered the nvme-cli package
>> (with the nvme command) and via its manpage I found this command:
>>
>> nvme get-feature /dev/nvme0 -f 3
>>
>> I didn't even know NVMe's had namespaces, but this didn't look good:
>>
>> The namespace or the format of that namespace is invalid(0x200b)
>>
>> ... without actually understanding what it means and/or what its
>> consequences are. It could be harmless and/or normal though.
>
> The feature you're requesting is the LBA range, which is namespace
> scoped. You need to specify a namespace id, either by opening the
> namespace's block device (/dev/nvme0n1) instead of the admin handle
> (/dev/nvme0), or you can manually specify the namespace with paramters
> "--namespace-id=1" or just "-n1".
Adding "-n1" does show normal (AFAICT) output. It's all zeros though.
And now the error message makes sense too :-)
The nvme-cli man page could/should have a better (ie working) example,
but that's not a kernel problem.
Thanks for your help and reassurances :-)
Cheers,
Diederik
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists