lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y36UJu4Ho54KBaHF@google.com>
Date:   Wed, 23 Nov 2022 21:44:06 +0000
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Vidya Sagar <vidyas@...dia.com>,
        "Kenneth R. Crudup" <kenny@...ix.com>, bhelgaas@...gle.com,
        lorenzo.pieralisi@....com, hkallweit1@...il.com,
        wangxiongfeng2@...wei.com, mika.westerberg@...ux.intel.com,
        kai.heng.feng@...onical.com, chris.packham@...iedtelesis.co.nz,
        yangyicong@...ilicon.com, treding@...dia.com, jonathanh@...dia.com,
        abhsahu@...dia.com, sagupta@...dia.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, kthota@...dia.com,
        mmaddireddy@...dia.com, sagar.tv@...il.com,
        Ricky Wu <ricky_wu@...ltek.com>,
        Rajat Jain <rajatja@...gle.com>,
        Prasad Malisetty <quic_pmaliset@...cinc.com>,
        Victor Ding <victording@...gle.com>
Subject: Re: [PATCH V1] PCI/ASPM: Save/restore L1SS Capability for
 suspend/resume

Hi,

not sure this is the best thread to reply to, but we are also
observing suspend issues with the same Kioxia NVMe on a platform
with a Qualcomm sc7280 SoC. The system runs a v5.15 downstream
kernel which includes most (post v5.15) ASPM patches from
upstream.

There are two issues with the Kioxia NVMe:

1. Power consumption is high unless a LTR_L1.2_THRESHOLD of 0ns
   is configured (related dubious patch: [1])

2. The system often hangs on resume unless a longer delay is
   added in the suspend pass. QC engineers say that the NVMe is
   taking so much time to settle in L1ss.

Other NVMe models don't exhibit power or suspend issues on this
platform, except for one model which also needs a shorter
delay during suspend, otherwise the system will hang
occasionally upon resume.

The second issue could possibly be 'fixed' with a quirk for
the Kioxia NVMe model, though it seems the issue is not seen on
all platforms, apparently the delay is not needed on Kenny's
system.

I'm currently a bit at a loss with the first issue. The patch
mentioned above claims that the (no-)snoop latencies are
involved, which may or may not be true. In this thread I saw
Kenny posting 'lspci' output from his (now) working system.
I noticed max (no-)snoop values of 3145728ns, which seems to
be some sort of default (programmed) max. On my system these
values are 0ns, which is the default value of the registers.
I tried to set these to 3146us from the kernel to see if that
makes a difference, but could only successfully update the max
snoop latency, but not non-snoop (maybe this can be only done
at early initialization time?). With just the max snoop latency
set to 3146us power consumption of the NVMe remains high.

The output of lspci from my system is attached.

In this thread it was mentioned that possibly a BIOS update
fixed the issue Kenny was seeing. What kind of values is
the BIOS supposed to adjust (I'm a PCI n00b)?

Any suggestions about what else to try?

Thanks

m.

[1] https://patchwork.kernel.org/project/linux-arm-msm/patch/1663315719-21563-1-git-send-email-quic_krichai@quicinc.com/

---

0001:00:00.0 PCI bridge: Qualcomm Device 010b (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 183
	IOMMU group: 0
	Region 0: Memory at 40700000 (32-bit, non-prefetchable) [size=4K]
	Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
	I/O behind bridge: 00001000-00001fff [size=4K]
	Memory behind bridge: 40300000-404fffff [size=2M]
	Prefetchable memory behind bridge: 0000000040500000-00000000406fffff [size=2M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit+
		Address: 00000000fffff000  Data: 0000
		Masking: fffffffe  Pending: 00000000
	Capabilities: [70] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
			ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 128 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x2 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise+
			Slot #0, PowerLimit 0.000W; Interlock+ NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd Off, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCap: CRSVisible-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP+ LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd-
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
		RootCmd: CERptEn- NFERptEn- FERptEn-
		RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
			 FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
		ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
	Capabilities: [148 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [168 v1] Transaction Processing Hints
		No steering table available
	Capabilities: [1fc v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=70us PortTPowerOnTime=0us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
			   T_CommonMode=70us LTR1.2_Threshold=86016ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: pcieport

0001:01:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0001 (prog-if 02 [NVM Express])
	Subsystem: KIOXIA Corporation Device 0001
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 182
	IOMMU group: 0
	Region 0: Memory at 40300000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <32us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x2 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [150 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [260 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [400 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
			  PortCommonModeRestoreTime=60us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=86016ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: nvme


On Tue, Apr 12, 2022 at 05:50:47PM -0500, Bjorn Helgaas wrote:
> [+cc Ricky for rtsx_pci ASPM behavior, Rajat, Prasad for L1 SS stuff,
> Victor for interest in disabling ASPM during save/restore]
> 
> On Wed, Feb 16, 2022 at 06:41:39PM +0530, Vidya Sagar wrote:
> > On 2/16/2022 11:30 AM, Kenneth R. Crudup wrote:
> > > On Wed, 16 Feb 2022, Vidya Sagar wrote:
> > > 
> > > > I see that the ASPM-L1 state of Realtek NIC which was in
> > > > disabled state before hibernate got enabled after hibernate.
> > > 
> > > That's actually my SD-Card reader; there's a good chance the BIOS
> > > does "something" to it at boot time, as it's possible to boot from
> > > SD-Card on my laptop.
> > > 
> > > > This patch doesn't do anything to LnkCtl register which has
> > > > control for ASPM L1 state.
> > > 
> > > > Could you please check why ASPM L1 got enabled post hibernation?
> > > 
> > > I wouldn't know how to do that; if you're still interested in that
> > > let me know what to do to determine that.
> 
> > I would like Bjorn to take a call on it.
> > At this point, there are contradictions in observations.
> 
> Remind me what contradictions you see?  I know Kenny saw NVMe errors
> on a kernel that included 4257f7e008ea ("PCI/ASPM: Save/restore L1SS
> Capability for suspend/resume") in December 2020 [1], and that he did
> *not* see those errors on 4257f7e008ea in February 2022 [2].  Is that
> what you mean?
> 
> > Just to summarize,
> > - The root ports in your laptop don't have support for L1SS
> > - With the same old code base with which the errors were observed plus my
> > patch on top of it, I see that ASPM-L1 state getting enabled for one of the
> > endpoints (Realtek SD-Card reader) after system comes out of hibernation
> > even though ASPM-L1 was disabled before the system enter into hibernation.
> > No errors are reported now.
> 
> I assume you refer to [2], where on 4257f7e008ea ("PCI/ASPM:
> Save/restore L1SS Capability for suspend/resume"), Kenny saw ASPM L1
> disabled before hibernate and enabled afterwards:
> 
>   --- pre-hibernate
>   +++ post-hibernate
>     00:1d.7 PCI bridge [0604]: Intel [8086:34b7]
>       Bus: primary=00, secondary=58, subordinate=58
> 	LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>     58:00.0 RTS525A PCI Express Card Reader [10ec:525a]
>   -     LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
>   -             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>   +     LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
>   +             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 
> Per PCIe r6.0, sec 7.5.3.7, "ASPM L1 must be enabled by software in
> the Upstream component on a Link prior to enabling ASPM L1 in the
> Downstream component on that Link," so this definitely seems broken,
> but wouldn't explain the NVMe issue.
> 
> The PCI core (pcie_config_aspm_link()) always enables L1 in the
> upstream component before the downstream one, but 58:00.0 uses the
> rtsx_pci driver, which does a lot of its own ASPM fiddling, so my
> guess is that it's doing something wrong here.
> 
> > - With the linux-next top of the tree plus my patch, no change in the ASPM
> > states and no errors also reported.
> 
> I don't know which report this refers to.
> 
> > This points to BIOS being buggy (both old and new with new one being less
> > problematic)
> 
> I agree that a BIOS change between [1] and [2] seems plausible, but I
> don't think we can prove that yet.  I'm slightly queasy because while
> Kenny may have updated his BIOS, most people will not have.
> 
> I think we should try this patch again with some changes and maybe
> some debug logging:
> 
>   - I wonder if we should integrate the LTR, L1 SS, and Link Control
>     ASPM restore instead of having them spread around through
>     pci_restore_ltr_state(), pci_restore_aspm_l1ss_state(), and
>     pci_restore_pcie_state().  Maybe a new pci_restore_aspm() that
>     would be called from pci_restore_pcie_state()?
> 
>   - For L1 PM Substates configuration, sec 5.5.4 says that both ports
>     must be configured while ASPM L1 is disabled, but I don't think we
>     currently guarantee this: we restore all the upstream component
>     state first, and we don't know the ASPM state of the downstream
>     one.  Maybe we need to:
> 
>       * When restoring upstream component,
>           + disable its ASPM
> 
>       * When restoring downstream component,
>           + disable its ASPM
> 	  + restore upstream component's LTR, L1SS
> 	  + restore downstream component's LTR, L1SS
> 	  + restore upstream component's ASPM
> 	  + restore downstream component's ASPM
> 
>       This seems pretty messy, but seems like what the spec requires.
> 
>     - Add some pci_dbg() logging of all these save/restore values to
>       help debug any issues.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/r/20201228040513.GA611645@bjorn-Precision-5520
> [2] https://lore.kernel.org/r/3ca14a7-b726-8430-fe61-a3ac183a1088@panix.com
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ