lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH2PR19MB4024FEB7A178841BD7EB85F9A0849@CH2PR19MB4024.namprd19.prod.outlook.com>
Date:   Tue, 26 Oct 2021 04:44:32 +0000
From:   Li Chen <lchen@...arella.com>
To:     Keith Busch <kbusch@...nel.org>
CC:     Bjorn Helgaas <helgaas@...nel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tom Joseph <tjoseph@...ence.com>, Jens Axboe <axboe@...com>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
Subject: RE: [EXT] Re: nvme may get timeout from dd when using different
 non-prefetch mmio outbound/ranges

Hi, Keith

> -----Original Message-----
> From: Keith Busch [mailto:kbusch@...nel.org]
> Sent: Tuesday, October 26, 2021 12:16 PM
> To: Li Chen
> Cc: Bjorn Helgaas; linux-pci@...r.kernel.org; Lorenzo Pieralisi; Rob Herring;
> kw@...ux.com; Bjorn Helgaas; linux-kernel@...r.kernel.org; Tom Joseph; Jens
> Axboe; Christoph Hellwig; Sagi Grimberg; linux-nvme@...ts.infradead.org
> Subject: Re: [EXT] Re: nvme may get timeout from dd when using different non-
> prefetch mmio outbound/ranges
> 
> On Tue, Oct 26, 2021 at 03:40:54AM +0000, Li Chen wrote:
> > My nvme is " 05:00.0 Non-Volatile memory controller: Samsung Electronics Co
> Ltd NVMe SSD Controller 980". From its datasheet,
> https://urldefense.com/v3/__https://s3.ap-northeast-
> 2.amazonaws.com/global.semi.static/Samsung_NVMe_SSD_980_Data_Sheet_R
> ev.1.1.pdf__;!!PeEy7nZLVv0!3MU3LdTWuzON9JMUkq29zwJM4d7g7wKtkiZszTu-
> PVepWchI_uLHpQGgdR_LEZM$ , it says nothing about CMB/SQEs, so I'm not sure.
> Is there other ways/tools(like nvme-cli) to query?
> 
> The driver will export a sysfs property for it if it is supported:
> 
>   # cat /sys/class/nvme/nvme0/cmb
> 
> If the file doesn't exist, then /dev/nvme0 doesn't have the capability.
> 

# ls /sys/class/nvme/nvme0/
address            model              rescan_controller  subsysnqn
cntlid             numa_node          reset_controller   subsystem
dev                nvme0n1            serial             transport
device             power              sqsize             uevent
firmware_rev       queue_count        state

my nvme doesn't cmb.


> > > > I don't know how to interpret "ranges".  Can you supply the dmesg and
> > > > "lspci -vvs 0000:05:00.0" output both ways, e.g.,
> > > >
> > > >   pci_bus 0000:00: root bus resource [mem 0x7f800000-0xefffffff window]
> > > >   pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
> > > >   pci 0000:05:00.0: [vvvv:dddd] type 00 class 0x...
> > > >   pci 0000:05:00.0: reg 0x10: [mem 0x.....000-0x.....fff ...]
> > > >
> > > > > Question:
> > > > > 1.  Why dd can cause nvme timeout? Is there more debug ways?
> > >
> > > That means the nvme controller didn't provide a response to a posted
> > > command within the driver's latency tolerance.
> >
> > FYI, with the help of pci bridger's vendor, they find something interesting:
> "From catc log, I saw some memory read pkts sent from SSD card, but its memory
> range is within the memory range of switch down port. So, switch down port will
> replay UR pkt. It seems not normal." and "Why SSD card send out some memory
> pkts which memory address is within switch down port's memory range. If so,
> switch will response UR pkts". I also don't understand how can this happen?
> 
> I think we can safely assume you're not attempting peer-to-peer, so that
> behavior as described shouldn't be happening. It sounds like the memory
> windows may be incorrect. The dmesg may help to show if something appears
> wrong.

Dmesg has been pasted in https://marc.info/?l=linux-pci&m=163522281024680&w=3

Yes, peer-to-peer cannot happen, nvme is my only ep:

# lspci -i /usr/share/misc/pci.ids
00:00.0 PCI bridge: Cadence Design Systems, Inc. Device 0100
01:00.0 PCI bridge: Pericom Semiconductor Device c016 (rev 07)
02:02.0 PCI bridge: Pericom Semiconductor Device c016 (rev 06)
02:04.0 PCI bridge: Pericom Semiconductor Device c016 (rev 06)
02:06.0 PCI bridge: Pericom Semiconductor Device c016 (rev 06)
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980

**********************************************************************
This email and attachments contain Ambarella Proprietary and/or Confidential Information and is intended solely for the use of the individual(s) to whom it is addressed. Any unauthorized review, use, disclosure, distribute, copy, or print is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy all copies of the original message. Thank you.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ