lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Aug 2015 00:03:15 -0400
From:	Alex Deucher <alexdeucher@...il.com>
To:	Jiang Liu <jiang.liu@...ux.intel.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Alexander Holler <holler@...oftware.de>,
	Mark Rustad <mark.d.rustad@...el.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Tony Luck <tony.luck@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Bugfix] x86, irq: Fix a regression caused by commit b5dc8e6c21e7

On Thu, Aug 13, 2015 at 6:13 PM, Alex Deucher <alexdeucher@...il.com> wrote:
> On Thu, Aug 13, 2015 at 4:15 PM, Alex Deucher <alexdeucher@...il.com> wrote:
>> On Thu, Aug 13, 2015 at 3:46 PM, Alex Deucher <alexdeucher@...il.com> wrote:
>>> On Mon, Aug 10, 2015 at 9:06 PM, Jiang Liu <jiang.liu@...ux.intel.com> wrote:
>>>> On 2015/8/10 23:00, Alex Deucher wrote:
>>>>> On Sun, Aug 9, 2015 at 4:15 AM, Jiang Liu <jiang.liu@...ux.intel.com> wrote:
>>>>>> Alex Deucher, Mark Rustad and Alexander Holler reported a regression
>>>>>> with the latest v4.2-rc4 kernel, which breaks some SATA controllers.
>>>>>> With multi-MSI capable SATA controllers, only the first port works,
>>>>>> all other ports times out when executing SATA commands. This regression
>>>>>> bisects to 52f518a3a7c2 ("x86/MSI: Use hierarchical irqdomains to manage
>>>>>> MSI interrupts"), but it's not the root cause, it just triggers a bug
>>>>>> caused by b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage
>>>>>> CPU interrupt vectors").
>>>>>>
>>>>>> With this patch applied, the affected SATA controllers work as expected.
>>>>>
>>>>> Yes, this fixes the SATA regression:
>>>>> Tested-by: Alex Deucher <alexander.deucher@....com>
>>>>>
>>>>> I'm not sure if it's related to this patch or not (I haven't bisected
>>>>> it independently yet), but MSIs don't seem to work on GPUs.  See the
>>>>> line for amdgpu.  This is just after loading the driver.
>>>> Hi Alex,
>>>>         This patch only affects multiple-MSI, and it seems that your
>>>> gpu only uses one MSI interrupt, so it may not be related to this patch.
>>>> And this seems like a sort of interrupt storm.
>>>>>   52:   16579895   16579562   16580988   16583443  IR-PCI-MSI
>>>>> 524288-edge      amdgpu
>>>>
>>>> Does it make any change by disable interrupt remapping?
>>>
>>> Nope.  Still going crazy:
>>>   46:    4769660    4769130    4775899    4784657   PCI-MSI
>>> 524288-edge      amdgpu
>>>
>>>
>>>> Does it make any change by disable MSI?
>>>
>>> If I set pci=nomsi, the sata controllers time out.  If I disable MSIs
>>> just for the gpu, I don't get any interrupts:
>>>   25:          0          0          0          0  IR-IO-APIC
>>> 0-fasteoi   amdgpu
>>>
>>
>> Strangely, it only seems to affect certain boards.  E.g., this card works fine:
>> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R9 260 OEM] (prog-if 00
>> [VGA controller])
>>     Subsystem: Diamond Multimedia Systems Device 2329
>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> <TAbort+ <MAbort- >SERR- <PERR- INTx-
>>     Latency: 0, Cache Line Size: 64 bytes
>>     Interrupt: pin A routed to IRQ 52
>>     Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>>     Region 2: Memory at d0000000 (64-bit, prefetchable) [size=8M]
>>     Region 4: I/O ports at e000 [size=256]
>>     Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K]
>>     Expansion ROM at ff640000 [disabled] [size=128K]
>>     Capabilities: [48] Vendor Specific Information: Len=08 <?>
>>     Capabilities: [50] Power Management version 3
>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
>> PME(D0-,D1+,D2+,D3hot+,D3cold-)
>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>     Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>>         DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>> <4us, L1 unlimited
>>             ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>         DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>             MaxPayload 256 bytes, MaxReadReq 512 bytes
>>         DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>         LnkCap:    Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
>> Latency L0s <64ns, L1 <1us
>>             ClockPM- Surprise- LLActRep- BwNot-
>>         LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>         LnkSta:    Speed 8GT/s, Width x16, TrErr- Train- SlotClk+
>> DLActive- BWMgmt- ABWMgmt-
>>         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
>> OBFF Not Supported
>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
>> OBFF Disabled
>>         LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>>              Transmit Margin: Normal Operating Range,
>> EnterModifiedCompliance- ComplianceSOS-
>>              Compliance De-emphasis: -6dB
>>         LnkSta2: Current De-emphasis Level: -6dB,
>> EqualizationComplete+, EqualizationPhase1+
>>              EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>>     Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>         Address: 00000000fee00000  Data: 0000
>>     Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>>     Capabilities: [150 v2] Advanced Error Reporting
>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>         AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>>     Capabilities: [270 v1] #19
>>     Capabilities: [2b0 v1] Address Translation Service (ATS)
>>         ATSCap:    Invalidate Queue Depth: 00
>>         ATSCtl:    Enable+, Smallest Translation Unit: 00
>>     Capabilities: [2c0 v1] #13
>>     Capabilities: [2d0 v1] #1b
>>     Kernel driver in use: amdgpu
>>
>> This one does not:
>> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Device 6939 (prog-if 00 [VGA controller])
>>     Subsystem: Gigabyte Technology Co., Ltd Device 229d
>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> <TAbort+ <MAbort- >SERR- <PERR- INTx-
>>     Latency: 0, Cache Line Size: 64 bytes
>>     Interrupt: pin A routed to IRQ 52
>>     Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>>     Region 2: Memory at d0000000 (64-bit, prefetchable) [size=2M]
>>     Region 4: I/O ports at e000 [size=256]
>>     Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K]
>>     Expansion ROM at ff640000 [disabled] [size=128K]
>>     Capabilities: [48] Vendor Specific Information: Len=08 <?>
>>     Capabilities: [50] Power Management version 3
>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
>> PME(D0-,D1+,D2+,D3hot+,D3cold+)
>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>     Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>>         DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>> <4us, L1 unlimited
>>             ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>         DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>             MaxPayload 256 bytes, MaxReadReq 512 bytes
>>         DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>         LnkCap:    Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit
>> Latency L0s <64ns, L1 <1us
>>             ClockPM- Surprise- LLActRep- BwNot-
>>         LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>         LnkSta:    Speed 8GT/s, Width x16, TrErr- Train- SlotClk+
>> DLActive- BWMgmt- ABWMgmt-
>>         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
>> OBFF Not Supported
>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
>> OBFF Disabled
>>         LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>>              Transmit Margin: Normal Operating Range,
>> EnterModifiedCompliance- ComplianceSOS-
>>              Compliance De-emphasis: -6dB
>>         LnkSta2: Current De-emphasis Level: -6dB,
>> EqualizationComplete+, EqualizationPhase1+
>>              EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>>     Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>         Address: 00000000fee00000  Data: 0000
>>     Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>>     Capabilities: [150 v2] Advanced Error Reporting
>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>         AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>>     Capabilities: [200 v1] #15
>>     Capabilities: [270 v1] #19
>>     Capabilities: [2b0 v1] Address Translation Service (ATS)
>>         ATSCap:    Invalidate Queue Depth: 00
>>         ATSCtl:    Enable+, Smallest Translation Unit: 00
>>     Capabilities: [2c0 v1] #13
>>     Capabilities: [2d0 v1] #1b
>>     Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
>>         ARICap:    MFVC- ACS-, Next Function: 1
>>         ARICtl:    MFVC- ACS-, Function Group: 0
>>     Kernel driver in use: amdgpu
>>
>> Any ideas?  I'll see if I can find the time to bisect this.
>
> I attempted to bisect this, however the regression happened prior to
> my driver being merged upstream:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=099bfbfc7fbbe22356c02f0caf709ac32e1126ea
> So I can't easily bisect it further without backporting the driver to
> each commit before that.  This may take a while...

Just a heads up, this ended up being an alignment issue in the driver
and was not a regression.

Alex

>
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ