linux-kernel - Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3436e410-5396-e40e-ab55-3e5a9b1f090a@kernel.org>
Date:   Fri, 15 Sep 2023 16:00:52 +0900
From:   Damien Le Moal <dlemoal@...nel.org>
To:     David Gow <david@...idgow.net>,
        Niklas Cassel <Niklas.Cassel@....com>,
        Bagas Sanjaya <bagasdotme@...il.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        patenteng <dimitar@...kalov.co.uk>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Linux IDE and libata <linux-ide@...r.kernel.org>,
        Linux PCI <linux-pci@...r.kernel.org>
Subject: Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe
 SATA to Constantly Reset

On 9/15/23 15:54, David Gow wrote:
> Le 2023/09/15 à 13:41, Damien Le Moal a écrit :
>> On 9/15/23 12:22, David Gow wrote:
>>> Le 2023/09/13 à 23:12, Niklas Cassel a écrit :
>>>> On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote:
>>>>> Hi,
>>>>>
>>>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>>>
>>>>>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second:
>>>>>>
>>>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>>>> [ 9683.270399] ata16.00: configured for UDMA/66
>>>>>>
>>>>>> So I've tracked the offending device:
>>>>>>
>>>>>> ll /sys/class/ata_port/ata16
>>>>>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16
>>>>>>
>>>>>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent
>>>>>> DRIVER=ahci
>>>>>> PCI_CLASS=10601
>>>>>> PCI_ID=1B4B:9130
>>>>>> PCI_SUBSYS_ID=1043:8438
>>>>>> PCI_SLOT_NAME=0000:0a:00.0
>>>>>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01
>>>>>>
>>>>>> lspci | grep 0a:00.0
>>>>>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11)
>>>>>>
>>>>>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all.
>>>>>
>>>>> See Bugzilla for the full thread.
>>>>>
>>>>> patenteng: I have asked you to bisect this regression. Any conclusion?
>>>>>
>>>>> Anyway, I'm adding this regression to regzbot:
>>>>>
>>>>> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902
>>>>
>>>> Hello Bagas, patenteng,
>>>>
>>>>
>>>> FYI, the prints:
>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>>> [ 9683.270399] ata16.00: configured for UDMA/66
>>>>
>>>> Just show that ATA error handler has been invoked.
>>>> There was no reset performed.
>>>>
>>>> If there was a reset, you would have seen something like:
>>>> [    1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>>> [    1.541250] ata8.00: configured for UDMA/133
>>>> [    1.541411] ata8: hard resetting link
>>>>
>>>>
>>>> Could you please try this patch and see if it improves things for you:
>>>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u
>>>>
>>>
>>> FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master
>>> [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that
>>> patch applied.
>>>
>>>
>>> The log is similar (the last two lines repeat several times a second):
>>> [    0.369632] ata14: SATA max UDMA/133 abar m2048@...7c10000 port
>>> 0xf7c10480 irq 33
>>> [    0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [    1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66
>>> [    1.031852] ata14.00: configured for UDMA/66
>>> [    1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [    1.414505] ata14.00: configured for UDMA/66
>>> [    1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [    1.744368] ata14.00: configured for UDMA/66
>>> [    2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [    2.074276] ata14.00: configured for UDMA/66
>>>
>>>
>>> lspci shows:
>>> 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0
>>> x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0])
>>>           Subsystem: Gigabyte Technology Co., Ltd Device b000
>>>           Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>           Latency: 0, Cache Line Size: 64 bytes
>>>           Interrupt: pin A routed to IRQ 33
>>>           Region 0: I/O ports at b050 [size=8]
>>>           Region 1: I/O ports at b040 [size=4]
>>>           Region 2: I/O ports at b030 [size=8]
>>>           Region 3: I/O ports at b020 [size=4]
>>>           Region 4: I/O ports at b000 [size=32]
>>>           Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K]
>>>           Expansion ROM at f7c00000 [disabled] [size=64K]
>>>           Capabilities: <access denied>
>>>           Kernel driver in use: ahci
>>>
>>> The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard.
>>> I'm using the controller for several drives, and it's working, it's just
>>> spammy. (At worst, there's some performance hitching, but that might
>>> just be journald rotating logs as they fill up with the message).
>>>
>>> I haven't had a chance to bisect yet (this is a slightly awkward machine
>>> for me to install test kernels on), but can also confirm it worked with
>>> 6.4.12.
>>>
>>> Hopefully that's useful. I'll get back to you if I manage to bisect it.
>>
>> Bisect will definitely be welcome. But first, please try adding the patch that
>> Niklas mentioned above:
>>
>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u
>>
>> If that fixes the issue, we know the culprit :)
>>
> 
> 
> Sorry: I wasn't clear. I did try with that patch (applied on top of 
> torvalds/master), and the issue remained.
> 
> I've started bisecting, but fear it'll take a while.

OK. Thanks.

> 
> Thanks,
> -- David
> 

-- 
Damien Le Moal
Western Digital Research