lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH2PR19MB4024B89E31335658709A0CB0A0849@CH2PR19MB4024.namprd19.prod.outlook.com>
Date:   Tue, 26 Oct 2021 04:33:47 +0000
From:   Li Chen <lchen@...arella.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tom Joseph <tjoseph@...ence.com>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
Subject: RE: [EXT] Re: nvme may get timeout from dd when using different
 non-prefetch mmio outbound/ranges

Hi, Bjorn

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@...nel.org]
> Sent: Monday, October 25, 2021 11:48 PM
> To: Li Chen
> Cc: linux-pci@...r.kernel.org; Lorenzo Pieralisi; Rob Herring; kw@...ux.com;
> Bjorn Helgaas; linux-kernel@...r.kernel.org; Tom Joseph; Keith Busch; Jens
> Axboe; Christoph Hellwig; Sagi Grimberg; linux-nvme@...ts.infradead.org
> Subject: [EXT] Re: nvme may get timeout from dd when using different non-
> prefetch mmio outbound/ranges
> 
> [+cc Tom (Cadence maintainer), NVMe folks]
> 
> On Fri, Oct 22, 2021 at 10:08:20AM +0000, Li Chen wrote:
> > Hi, all
> >
> > I found my nvme may get timeout with simply dd, the pcie controller
> > is cadence, and the pcie-controller platform I use is
> > pcie-cadence-host.c, kenrel version is 5.10.32:
> 
> Given that nvme is pretty widely used, I tend to suspect the cadence
> driver or DTS here, but I added both cadence and nvme folks to the CC:
> list.

Thanks!

> 
> > # dd if=/dev/urandom of=/dev/nvme0n1 bs=4k count=1024000
> > [   41.913484][  T274] urandom_read: 2 callbacks suppressed
> > [   41.913490][  T274] random: dd: uninitialized urandom read (4096 bytes read)
> > [   41.926130][  T274] random: dd: uninitialized urandom read (4096 bytes read)
> > [   41.933348][  T274] random: dd: uninitialized urandom read (4096 bytes read)
> > [   47.651842][    C0] random: crng init done
> > [   47.655963][    C0] random: 2 urandom warning(s) missed due to ratelimiting
> > [   81.448128][   T64] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x2010
> > [   81.481139][    T7] nvme 0000:05:00.0: enabling bus mastering
> > [   81.486946][    T7] nvme 0000:05:00.0: saving config space at offset 0x0
> (reading 0xa809144d)
> > [   81.495517][    T7] nvme 0000:05:00.0: saving config space at offset 0x4
> (reading 0x20100006)
> > [   81.504091][    T7] nvme 0000:05:00.0: saving config space at offset 0x8
> (reading 0x1080200)
> > [   81.512571][    T7] nvme 0000:05:00.0: saving config space at offset 0xc
> (reading 0x0)
> > [   81.520527][    T7] nvme 0000:05:00.0: saving config space at offset 0x10
> (reading 0x8000004)
> > [   81.529094][    T7] nvme 0000:05:00.0: saving config space at offset 0x14
> (reading 0x0)
> > [   81.537138][    T7] nvme 0000:05:00.0: saving config space at offset 0x18
> (reading 0x0)
> > [   81.545186][    T7] nvme 0000:05:00.0: saving config space at offset 0x1c
> (reading 0x0)
> > [   81.553252][    T7] nvme 0000:05:00.0: saving config space at offset 0x20
> (reading 0x0)
> > [   81.561296][    T7] nvme 0000:05:00.0: saving config space at offset 0x24
> (reading 0x0)
> > [   81.569340][    T7] nvme 0000:05:00.0: saving config space at offset 0x28
> (reading 0x0)
> > [   81.577384][    T7] nvme 0000:05:00.0: saving config space at offset 0x2c
> (reading 0xa801144d)
> > [   81.586038][    T7] nvme 0000:05:00.0: saving config space at offset 0x30
> (reading 0x0)
> > [   81.594081][    T7] nvme 0000:05:00.0: saving config space at offset 0x34
> (reading 0x40)
> > [   81.602217][    T7] nvme 0000:05:00.0: saving config space at offset 0x38
> (reading 0x0)
> > [   81.610266][    T7] nvme 0000:05:00.0: saving config space at offset 0x3c
> (reading 0x12c)
> > [   81.634065][    T7] nvme nvme0: Shutdown timeout set to 8 seconds
> > [   81.674332][    T7] nvme nvme0: 1/0/0 default/read/poll queues
> > [  112.168136][  T256] nvme nvme0: I/O 12 QID 1 timeout, disable controller
> > [  112.193145][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 600656 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.205220][  T256] Buffer I/O error on dev nvme0n1, logical block 75082, lost
> async page write
> > [  112.213978][  T256] Buffer I/O error on dev nvme0n1, logical block 75083, lost
> async page write
> > [  112.222727][  T256] Buffer I/O error on dev nvme0n1, logical block 75084, lost
> async page write
> > [  112.231474][  T256] Buffer I/O error on dev nvme0n1, logical block 75085, lost
> async page write
> > [  112.240220][  T256] Buffer I/O error on dev nvme0n1, logical block 75086, lost
> async page write
> > [  112.248966][  T256] Buffer I/O error on dev nvme0n1, logical block 75087, lost
> async page write
> > [  112.257719][  T256] Buffer I/O error on dev nvme0n1, logical block 75088, lost
> async page write
> > [  112.266467][  T256] Buffer I/O error on dev nvme0n1, logical block 75089, lost
> async page write
> > [  112.275213][  T256] Buffer I/O error on dev nvme0n1, logical block 75090, lost
> async page write
> > [  112.283959][  T256] Buffer I/O error on dev nvme0n1, logical block 75091, lost
> async page write
> > [  112.293554][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 601672 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.306559][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 602688 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.319525][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 603704 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.332501][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 604720 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.345466][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 605736 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.358427][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 606752 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.371386][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 607768 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.384346][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 608784 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.397315][  T256] blk_update_request: I/O error, dev nvme0n1, sector
> 609800 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
> > [  112.459313][    T7] nvme nvme0: failed to mark controller live state
> > [  112.465758][    T7] nvme nvme0: Removing after probe failure status: -19
> > dd: error writing '/dev/nvme0n1': No space left on device
> > 112200+0 records in
> > 112199+0 records out
> > 459567104 bytes (438.3MB) copied, 70.573266 seconds, 6.2MB/s
> > # [  112.935768][    T7] nvme nvme0: failed to set APST feature (-19)
> >
> >
> >
> >
> > Here is the dts I used:
> > pciec: pcie-controller@...0000000 {
> >                                 compatible = "cdns,cdns-pcie-host";
> > 		device_type = "pci";
> > 		#address-cells = <3>;
> > 		#size-cells = <2>;
> > 		bus-range = <0 5>;
> > 		linux,pci-domain = <0>;
> > 		cdns,no-bar-match-nbits = <38>;
> > 		vendor-id = <0x17cd>;
> > 		device-id = <0x0100>;
> > 		reg-names = "reg", "cfg";
> > 		reg = <0x20 0x40000000 0x0 0x10000000>,
> > 		      <0x20 0x00000000 0x0 0x00001000>;	/* RC only */
> >
> > 		/*
> > 		 * type: 0x00000000 cfg space
> > 		 * type: 0x01000000 IO
> > 		 * type: 0x02000000 32bit mem space No prefetch
> > 		 * type: 0x03000000 64bit mem space No prefetch
> > 		 * type: 0x43000000 64bit mem space prefetch
> > 		 * The First 16MB from BUS_DEV_FUNC=0:0:0 for cfg space
> > 		 * <0x00000000 0x00 0x00000000 0x20 0x00000000 0x00
> 0x01000000>, CFG_SPACE
> > 		*/
> > 		ranges = <0x01000000 0x00 0x00000000 0x20 0x00100000 0x00
> 0x00100000>,
> > 			 <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00
> 0x08000000>;
> >
> > 		#interrupt-cells = <0x1>;
> > 		interrupt-map-mask = <0x00 0x0 0x0 0x7>;
> > 		interrupt-map = <0x0 0x0 0x0 0x1 &gic 0 229 0x4>,
> > 				<0x0 0x0 0x0 0x2 &gic 0 230 0x4>,
> > 				<0x0 0x0 0x0 0x3 &gic 0 231 0x4>,
> > 				<0x0 0x0 0x0 0x4 &gic 0 232 0x4>;
> > 		phys = <&pcie_phy>;
> > 		phy-names="pcie-phy";
> > 		status = "ok";
> > 	};
> >
> >
> > After some digging, I find if I change the controller's range
> > property from
> >
> > <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x08000000> into
> > <0x02000000 0x00 0x00400000 0x20 0x00400000 0x00 0x08000000>,
> >
> > then dd will success without timeout. IIUC, range here
> > is only for non-prefetch 32bit mmio, but dd will use dma (maybe cpu
> > will send cmd to nvme controller via mmio?).
> 
> I don't know how to interpret "ranges".  Can you supply the dmesg and
> "lspci -vvs 0000:05:00.0" output both ways, e.g.,
> 
>   pci_bus 0000:00: root bus resource [mem 0x7f800000-0xefffffff window]
>   pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
>   pci 0000:05:00.0: [vvvv:dddd] type 00 class 0x...
>   pci 0000:05:00.0: reg 0x10: [mem 0x.....000-0x.....fff ...]
> 

From the source code:

static int devm_of_pci_get_host_bridge_resources
{
	...
	err = of_pci_range_parser_init(&parser, dev_node);
	...
	for_each_of_pci_range(&parser, &range) {
		/* Read next ranges element */
		if ((range.flags & IORESOURCE_TYPE_BITS) == IORESOURCE_IO)
			range_type = "IO";
		else if ((range.flags & IORESOURCE_TYPE_BITS) == IORESOURCE_MEM)
			range_type = "MEM";
		else
			range_type = "err";
		dev_info(dev, "  %6s %#012llx..%#012llx -> %#012llx\n",
			 range_type, range.cpu_addr,
			 range.cpu_addr + range.size - 1, range.pci_addr);

		/*
		 * If we failed translation or got a zero-sized region
		 * then skip this range
		 */
		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
			continue;

		err = of_pci_range_to_resource(&range, dev_node, &tmp_res);
		...
}

So, it simply translates mmio mapping.



Take
ranges = <0x01000000 0x00 0x00000000 0x20 0x00100000 0x00 0x00100000>,
			 <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x04000000>;
for example (this ranges will make nvme timeout):

lspci output:
# lspci -i /usr/share/misc/pci.ids -vvs 0000:05:00.0
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 44
        Region 0: Memory at 2008000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [b0] MSI-X: Enable- Count=13 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme
        Kernel modules: nvme

	


/proc/iomem:

01200000-ffffffff : System RAM
  01280000-022affff : Kernel code
  022b0000-0295ffff : reserved
  02960000-040cffff : Kernel data
  05280000-0528ffff : reserved
  41cc0000-422c0fff : reserved
  422c1000-4232afff : reserved
  4232d000-667bbfff : reserved
  667bc000-667bcfff : reserved
  667bd000-667c0fff : reserved
  667c1000-ffffffff : reserved
2000000000-2000000fff : cfg
2008000000-200bffffff : pcie-controller@...0000000
  2008000000-20081fffff : PCI Bus 0000:01
    2008000000-20080fffff : PCI Bus 0000:02
      2008000000-20080fffff : PCI Bus 0000:05
        2008000000-2008003fff : 0000:05:00.0
          2008000000-2008003fff : nvme
    2008100000-200817ffff : 0000:01:00.0
...

From dmesg output:

# dmesg | grep pci
[    0.000000] Kernel command line: console=ttyS0 ubi.mtd=lnx root=ubi0:rootfs rw rootfstype=ubifs init=/linuxrc pci=nomsi pcie_pme=nomsi nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pci=nommconf nvme_core.default_ps_max_latency_us=0
[    2.431732] cdns-pcie 2040000000.pcie-controller: host bridge /pcie-controller@...0000000 ranges:
[    2.442055] cdns-pcie 2040000000.pcie-controller: Parsing ranges property...
[    2.450010] cdns-pcie 2040000000.pcie-controller:       IO 0x2000100000..0x20001fffff -> 0x0000000000
[    2.460215] cdns-pcie 2040000000.pcie-controller:      MEM 0x2008000000..0x200bffffff -> 0x0008000000
[    2.523271] cdns-pcie 2040000000.pcie-controller: Link up
[    2.530647] cdns-pcie 2040000000.pcie-controller: PCI host bridge to bus 0000:00
[    2.538885] pci_bus 0000:00: root bus resource [bus 00-05]
[    2.545194] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
[    2.552280] pci_bus 0000:00: root bus resource [mem 0x2008000000-0x200bffffff] (bus address [0x08000000-0x0bffffff])
[    2.563651] pci_bus 0000:00: scanning bus
[    2.568753] pci 0000:00:00.0: [17cd:0100] type 01 class 0x060400
[    2.576196] pci 0000:00:00.0: supports D1
[    2.581006] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    2.587667] pci 0000:00:00.0: PME# disabled
[    2.597799] pci_bus 0000:00: fixups for bus
[    2.602800] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[    2.610308] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.619160] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[    2.627525] pci_bus 0000:01: scanning bus
[    2.632550] pci 0000:01:00.0: [12d8:c016] type 01 class 0x060400
[    2.639425] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0007ffff]
[    2.647250] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    2.654165] pci 0000:01:00.0: PME# disabled
[    2.664011] pci_bus 0000:01: fixups for bus
[    2.669012] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 0
[    2.676521] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.685370] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 1
[    2.693768] pci_bus 0000:02: scanning bus
[    2.699101] pci 0000:02:02.0: [12d8:c016] type 01 class 0x060400
[    2.706740] pci 0000:02:02.0: PME# supported from D0 D3hot D3cold
[    2.713653] pci 0000:02:02.0: PME# disabled
[    2.720417] pci 0000:02:04.0: [12d8:c016] type 01 class 0x060400
[    2.728032] pci 0000:02:04.0: PME# supported from D0 D3hot D3cold
[    2.734945] pci 0000:02:04.0: PME# disabled
[    2.741684] pci 0000:02:06.0: [12d8:c016] type 01 class 0x060400
[    2.749288] pci 0000:02:06.0: PME# supported from D0 D3hot D3cold
[    2.756201] pci 0000:02:06.0: PME# disabled
[    2.765483] pci_bus 0000:02: fixups for bus
[    2.770482] pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 0
[    2.777991] pci 0000:02:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.786839] pci 0000:02:04.0: scanning [bus 00-00] behind bridge, pass 0
[    2.794337] pci 0000:02:04.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.803184] pci 0000:02:06.0: scanning [bus 00-00] behind bridge, pass 0
[    2.810691] pci 0000:02:06.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.819539] pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 1
[    2.827948] pci_bus 0000:03: scanning bus
[    2.836100] pci_bus 0000:03: fixups for bus
[    2.841085] pci_bus 0000:03: bus scan returning with max=03
[    2.847467] pci_bus 0000:03: busn_res: [bus 03-05] end is updated to 03
[    2.854911] pci 0000:02:04.0: scanning [bus 00-00] behind bridge, pass 1
[    2.863151] pci_bus 0000:04: scanning bus
[    2.871319] pci_bus 0000:04: fixups for bus
[    2.876303] pci_bus 0000:04: bus scan returning with max=04
[    2.882683] pci_bus 0000:04: busn_res: [bus 04-05] end is updated to 04
[    2.890151] pci 0000:02:06.0: scanning [bus 00-00] behind bridge, pass 1
[    2.898383] pci_bus 0000:05: scanning bus
[    2.903362] pci 0000:05:00.0: [144d:a809] type 00 class 0x010802
[    2.910266] pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[    2.923188] pci_bus 0000:05: fixups for bus
[    2.928174] pci_bus 0000:05: bus scan returning with max=05
[    2.934553] pci_bus 0000:05: busn_res: [bus 05] end is updated to 05
[    2.941724] pci_bus 0000:02: bus scan returning with max=05
[    2.948101] pci_bus 0000:02: busn_res: [bus 02-05] end is updated to 05
[    2.955557] pci_bus 0000:01: bus scan returning with max=05
[    2.961983] pci_bus 0000:01: busn_res: [bus 01-05] end is updated to 05
[    2.969413] pci_bus 0000:00: bus scan returning with max=05
[    2.975843] pci 0000:00:00.0: BAR 8: assigned [mem 0x2008000000-0x20081fffff]
[    2.983843] pci 0000:01:00.0: BAR 8: assigned [mem 0x2008000000-0x20080fffff]
[    2.991795] pci 0000:01:00.0: BAR 0: assigned [mem 0x2008100000-0x200817ffff]
[    2.999801] pci 0000:02:06.0: BAR 8: assigned [mem 0x2008000000-0x20080fffff]
[    3.007763] pci 0000:02:02.0: PCI bridge to [bus 03]
[    3.013622] pci 0000:02:04.0: PCI bridge to [bus 04]
[    3.019487] pci 0000:05:00.0: BAR 0: assigned [mem 0x2008000000-0x2008003fff 64bit]
[    3.028012] pci 0000:02:06.0: PCI bridge to [bus 05]
[    3.033792] pci 0000:02:06.0:   bridge window [mem 0x2008000000-0x20080fffff]
[    3.041767] pci 0000:01:00.0: PCI bridge to [bus 02-05]
[    3.047808] pci 0000:01:00.0:   bridge window [mem 0x2008000000-0x20080fffff]
[    3.055782] pci 0000:00:00.0: PCI bridge to [bus 01-05]
[    3.061822] pci 0000:00:00.0:   bridge window [mem 0x2008000000-0x20081fffff]
[    4.184289]     pcie_pme=nomsi


Also take
ranges = <0x01000000 0x00 0x00000000 0x20 0x00100000 0x00 0x00100000>,
			 <0x02000000 0x00 0x08000000 0x20 0x08000000 0x00 0x04000000>;
for example (this ranges will not make nvme timeout):

lspci:
# lspci -i /usr/share/misc/pci.ids -vvs 0000:05:00.0
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 44
        Region 0: Memory at 2000400000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [b0] MSI-X: Enable- Count=13 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme
        Kernel modules: nvme

/proc/iomem:
01200000-ffffffff : System RAM
  01280000-022affff : Kernel code
  022b0000-0295ffff : reserved
  02960000-040cffff : Kernel data
  05280000-0528ffff : reserved
  41cc0000-422c0fff : reserved
  422c1000-4232afff : reserved
  4232d000-667bbfff : reserved
  667bc000-667bcfff : reserved
  667bd000-667c0fff : reserved
  667c1000-ffffffff : reserved
2000000000-2000000fff : cfg
2000400000-20083fffff : pcie-controller@...0000000
  2000400000-20005fffff : PCI Bus 0000:01
    2000400000-20004fffff : PCI Bus 0000:02
      2000400000-20004fffff : PCI Bus 0000:05
        2000400000-2000403fff : 0000:05:00.0
    2000500000-200057ffff : 0000:01:00.0
...

Dmesg:
# dmesg | grep pci
[    0.000000] Kernel command line: console=ttyS0 ubi.mtd=lnx root=ubi0:rootfs rw rootfstype=ubifs init=/linuxrc pci=nomsi pcie_pme=nomsi nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pci=nommconf nvme_core.default_ps_max_latency_us=0
[    2.436357] cdns-pcie 2040000000.pcie-controller: host bridge /pcie-controller@...0000000 ranges:
[    2.446129] cdns-pcie 2040000000.pcie-controller: Parsing ranges property...
[    2.454141] cdns-pcie 2040000000.pcie-controller:       IO 0x2000100000..0x20001fffff -> 0x0000000000
[    2.464350] cdns-pcie 2040000000.pcie-controller:      MEM 0x2000400000..0x20083fffff -> 0x0000400000
[    2.527395] cdns-pcie 2040000000.pcie-controller: Link up
[    2.534745] cdns-pcie 2040000000.pcie-controller: PCI host bridge to bus 0000:00
[    2.543036] pci_bus 0000:00: root bus resource [bus 00-05]
[    2.549345] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
[    2.556431] pci_bus 0000:00: root bus resource [mem 0x2000400000-0x20083fffff] (bus address [0x00400000-0x083fffff])
[    2.567802] pci_bus 0000:00: scanning bus
[    2.572823] pci 0000:00:00.0: [17cd:0100] type 01 class 0x060400
[    2.580290] pci 0000:00:00.0: supports D1
[    2.585099] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    2.591709] pci 0000:00:00.0: PME# disabled
[    2.601802] pci_bus 0000:00: fixups for bus
[    2.606801] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[    2.614310] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.623161] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[    2.631551] pci_bus 0000:01: scanning bus
[    2.636563] pci 0000:01:00.0: [12d8:c016] type 01 class 0x060400
[    2.643460] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0007ffff]
[    2.651289] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    2.658202] pci 0000:01:00.0: PME# disabled
[    2.667966] pci_bus 0000:01: fixups for bus
[    2.673047] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 0
[    2.680556] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.689405] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 1
[    2.697804] pci_bus 0000:02: scanning bus
[    2.703079] pci 0000:02:02.0: [12d8:c016] type 01 class 0x060400
[    2.710742] pci 0000:02:02.0: PME# supported from D0 D3hot D3cold
[    2.717657] pci 0000:02:02.0: PME# disabled
[    2.724466] pci 0000:02:04.0: [12d8:c016] type 01 class 0x060400
[    2.732078] pci 0000:02:04.0: PME# supported from D0 D3hot D3cold
[    2.739037] pci 0000:02:04.0: PME# disabled
[    2.745734] pci 0000:02:06.0: [12d8:c016] type 01 class 0x060400
[    2.753344] pci 0000:02:06.0: PME# supported from D0 D3hot D3cold
[    2.760257] pci 0000:02:06.0: PME# disabled
[    2.769510] pci_bus 0000:02: fixups for bus
[    2.774532] pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 0
[    2.782043] pci 0000:02:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.790892] pci 0000:02:04.0: scanning [bus 00-00] behind bridge, pass 0
[    2.798494] pci 0000:02:04.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.807347] pci 0000:02:06.0: scanning [bus 00-00] behind bridge, pass 0
[    2.814855] pci 0000:02:06.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    2.823701] pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 1
[    2.832098] pci_bus 0000:03: scanning bus
[    2.840343] pci_bus 0000:03: fixups for bus
[    2.845328] pci_bus 0000:03: bus scan returning with max=03
[    2.851708] pci_bus 0000:03: busn_res: [bus 03-05] end is updated to 03
[    2.859152] pci 0000:02:04.0: scanning [bus 00-00] behind bridge, pass 1
[    2.867404] pci_bus 0000:04: scanning bus
[    2.875551] pci_bus 0000:04: fixups for bus
[    2.880535] pci_bus 0000:04: bus scan returning with max=04
[    2.886914] pci_bus 0000:04: busn_res: [bus 04-05] end is updated to 04
[    2.894359] pci 0000:02:06.0: scanning [bus 00-00] behind bridge, pass 1
[    2.902637] pci_bus 0000:05: scanning bus
[    2.907632] pci 0000:05:00.0: [144d:a809] type 00 class 0x010802
[    2.914536] pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[    2.927440] pci_bus 0000:05: fixups for bus
[    2.932426] pci_bus 0000:05: bus scan returning with max=05
[    2.938807] pci_bus 0000:05: busn_res: [bus 05] end is updated to 05
[    2.946025] pci_bus 0000:02: bus scan returning with max=05
[    2.952403] pci_bus 0000:02: busn_res: [bus 02-05] end is updated to 05
[    2.959834] pci_bus 0000:01: bus scan returning with max=05
[    2.966233] pci_bus 0000:01: busn_res: [bus 01-05] end is updated to 05
[    2.973666] pci_bus 0000:00: bus scan returning with max=05
[    2.980097] pci 0000:00:00.0: BAR 8: assigned [mem 0x2000400000-0x20005fffff]
[    2.988097] pci 0000:01:00.0: BAR 8: assigned [mem 0x2000400000-0x20004fffff]
[    2.996046] pci 0000:01:00.0: BAR 0: assigned [mem 0x2000500000-0x200057ffff]
[    3.004092] pci 0000:02:06.0: BAR 8: assigned [mem 0x2000400000-0x20004fffff]
[    3.012055] pci 0000:02:02.0: PCI bridge to [bus 03]
[    3.017867] pci 0000:02:04.0: PCI bridge to [bus 04]
[    3.023709] pci 0000:05:00.0: BAR 0: assigned [mem 0x2000400000-0x2000403fff 64bit]
[    3.032253] pci 0000:02:06.0: PCI bridge to [bus 05]
[    3.038071] pci 0000:02:06.0:   bridge window [mem 0x2000400000-0x20004fffff]
[    3.046086] pci 0000:01:00.0: PCI bridge to [bus 02-05]
[    3.052127] pci 0000:01:00.0:   bridge window [mem 0x2000400000-0x20004fffff]
[    3.060100] pci 0000:00:00.0: PCI bridge to [bus 01-05]
[    3.066140] pci 0000:00:00.0:   bridge window [mem 0x2000400000-0x20005fffff]
[    4.181033]     pcie_pme=nomsi
[   44.419264] nvme nvme0: pci function 0000:05:00.0
[   44.425541] pci 0000:00:00.0: enabling device (0000 -> 0002)
[   44.432217] pci 0000:00:00.0: enabling bus mastering
[   44.438183] pci 0000:01:00.0: enabling device (0000 -> 0002)
[   44.444720] pci 0000:01:00.0: enabling bus mastering
[   44.453373] pci 0000:02:06.0: enabling device (0000 -> 0002)
[   44.459951] pci 0000:02:06.0: enabling bus mastering


[firstlove@...lchen tmp]$ diff no-timeout.lspci timeout.lspci 
8c8
<         Region 0: Memory at 2000400000 (64-bit, non-prefetchable) [size=16K]
---
>         Region 0: Memory at 2008000000 (64-bit, non-prefetchable) [size=16K]

> > Question:
> > 1.  Why dd can cause nvme timeout? Is there more debug ways?
> > 2. How can this mmio range affect nvme timeout?
> >
> > Regards,
> > Li
> >
> >
> **************************************************************
> ********
> > This email and attachments contain Ambarella Proprietary and/or Confidential
> Information and is intended solely for the use of the individual(s) to whom it is
> addressed. Any unauthorized review, use, disclosure, distribute, copy, or print is
> prohibited. If you are not an intended recipient, please contact the sender by
> reply email and destroy all copies of the original message. Thank you.
> 
> ##############################################################
> ########
> This EXTERNAL email has been scanned by Proofpoint Email Protect service.

Regards,
Li

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ