linux-kernel - Re: Re: [PATCH] ahci: libahci: clear pending interrupt status

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230814041720.76214-1-Chloe_Chen@asmedia.com.tw>
Date:   Mon, 14 Aug 2023 12:17:20 +0800
From:   Szuying Chen <chensiying21@...il.com>
To:     dlemoal@...nel.org, linux-ide@...r.kernel.org,
        linux-kernel@...r.kernel.org
Cc:     Jesse1_Chang@...edia.com.tw, Richard_Hsu@...edia.com.tw,
        Szuying Chen <Chloe_Chen@...edia.com.tw>
Subject: Re: Re: [PATCH] ahci: libahci: clear pending interrupt status

On 8/10/23 17:59, Damien Le Moal wrote: 
>On 8/10/23 18:31, Szuying Chen wrote:
>>> On 8/10/23 14:12, Damien Le Moal wrote: 
>>>    On 8/10/23 14:05, Szuying Chen wrote:
>>>    > When ISR handle interface fatal error with error recovery after clear PxIS
>>>    > and PxIE. Another FIS(SDB FIS with err) that set PxIS.IFS to 1 is recevied
>>>    > during error recovery, which causing the HBA not issue any new commands
>>>    > after cmd.ST set 1.
>>>
>>>    This is not very clear. If there was a fatal error, the drive should be in
>>>    error state and no other SDB FIS can be received as the drive does absolutely
>>>    nothing while in error state (it only waits for a read log 10h command to be>
>>>    issued to get it out of error state). So if you are seeing 2 SDB FIS with
>>>    errors one after the other, you have a buggy device...
>>>
>>>    However, I may be misunderstanding your issue here. Could you provide more
>>>    details and a dmesg output example of the issue ?
>> 
>> According to AHCI 1.3.1 specification ch6.1.9, when an R_ERR is received
>> on an H2D Data FIS in normal operation, the HBA sets PxIS.IFS to 1
>> (fatal error) and halts operation. Referring to SATA 3.0 specification we
>> know the device will halt queued command processing and transmit SDB FIS to
>> host with ERR bit in Status field set to one(set PxIS.TFES to 1).
>
>Sure, but that SBD FIS should be completely ignored by the adapter since it
>stopped operation. If you see it, then it means that the handling of the first
>error was incomplete.
>
The SDB FIS  happens between pxIS.IE was cleared and PxCMD.ST not cleared
to 0. Because the PxCMD.ST still setting, so the SDB FIS can't ignored.  

>> In our case, the ISR handles fatal errors(PxIS.IFS) and enters error 
>> recovery after cleaning up PxIS and PxIE. Then a SDB FIS is received 
>> with interrupt bit(PxIS.TFES) set to 1. According to AHCI 1.3.1 
>> specification ch6.2.2, HBA can't issue(cmd.ST set to 1) any new commands
>> under PxIS.TFES alive during error recovery.
>
>But how come you see a new command being issued ? This entire situation should
>result in a port reset with the first error. I do not see how this is possible.
>Are you saying that the port reset is not cleaning things up properly ? Could
>you share the dmesg output of this case ?
>
>
dmesg:
[  654.244958] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  654.460522] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  656.180375] ata7.00: exception Emask 0x10 SAct 0x7c0 SErr 0x200100 
action 0x6 frozen 
[  656.180384] ata7.00: irq_stat 0x08000000, interface fatal error 
[  656.180386] ata7: SError: { UnrecovData BadCRC } 
[  656.180389] ata7.00: failed command: READ FPDMA QUEUED 
[  656.180390] ata7.00: cmd 60/00:30:28:a9:9c/0a:00:01:00:00/40 tag 6 ncq 
dma 1310720 in 
	res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) 
[  656.180395] ata7.00: status: { DRDY } 
[  656.180397] ata7.00: failed command: READ FPDMA QUEUED 
[  656.180398] ata7.00: cmd 60/00:38:28:b3:9c/0a:00:01:00:00/40 tag 7 ncq 
dma 1310720 in 
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) 
[  656.180402] ata7.00: status: { DRDY } 
[  656.180403] ata7.00: failed command: READ FPDMA QUEUED 
[  656.180404] ata7.00: cmd 60/00:40:28:bd:9c/0a:00:01:00:00/40 tag 8 ncq 
dma 1310720 in
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) 
[  656.180408] ata7.00: status: { DRDY } 
[  656.180410] ata7.00: failed command: READ FPDMA QUEUED 
[  656.180411] ata7.00: cmd 60/00:48:28:c7:9c/0a:00:01:00:00/40 tag 9 ncq 
dma 1310720 in 
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) 
[  656.180414] ata7.00: status: { DRDY } 
[  656.180416] ata7.00: failed command: READ FPDMA QUEUED 
[  656.180417] ata7.00: cmd 60/00:50:28:d1:9c/0a:00:01:00:00/40 tag 10 ncq 
dma 1310720 in 
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) 
[  656.180421] ata7.00: status: { DRDY } 
[  656.180423] ata7: hard resetting link 
[  659.210328] amd_iommu_report_page_fault: 48 callbacks suppressed 
[  659.210333] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.210345] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.210352] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.811069] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.811080] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.811086] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.827669] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.827676] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.827680] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  659.878319] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.426805] amd_iommu_report_page_fault: 248 callbacks suppressed 
[  664.426809] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.426819] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.426825] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.676294] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.676305] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.676313] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.693501] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.693513] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.693521] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  664.711480] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000] 
[  666.180398] ata7: softreset failed (1st FIS failed) 
[  666.180405] ata7: hard resetting link 
[  670.054987] amd_iommu_report_page_fault: 99 callbacks suppressed
[  670.054992] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000]
[  670.055003] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000]
[  670.055010] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0019 address=0x0 flags=0x0000]


dmesg (add clear pending Interrupt):
[  120.236847] ahci_error_intr: interface fatal error
[  120.236854] ahci_error_intr: freeze (set PxIE to 0)
[  120.265268] ata7: limiting SATA link speed to 3.0 Gbps
[  120.265275] ata7.00: exception Emask 0x10 SAct 0x3f0 SErr 0x400000 
action 0x6 frozen
[  120.265279] ata7.00: irq_stat 0x08000000, interface fatal error
[  120.265281] ata7: SError: { Handshk }
[  120.265285] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265287] ata7.00: cmd 61/00:20:a8:b9:50/0a:00:00:00:00/40 tag 4 ncq 
dma 1310720 ou
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265295] ata7.00: status: { DRDY }
[  120.265297] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265299] ata7.00: cmd 61/20:28:a8:c3:50/03:00:00:00:00/40 tag 5 ncq 
dma 409600 out
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265305] ata7.00: status: { DRDY }
[  120.265307] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265309] ata7.00: cmd 61/00:30:c8:c6:50/0a:00:00:00:00/40 tag 6 ncq 
dma 1310720 ou
	res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265314] ata7.00: status: { DRDY }
[  120.265316] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265318] ata7.00: cmd 61/00:38:c8:d0:50/0a:00:00:00:00/40 tag 7 ncq 
dma 1310720 ou
	res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265324] ata7.00: status: { DRDY }
[  120.265325] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265327] ata7.00: cmd 61/00:40:c8:da:50/0a:00:00:00:00/40 tag 8 ncq 
dma 1310720 ou
	res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265333] ata7.00: status: { DRDY }
[  120.265334] ata7.00: failed command: WRITE FPDMA QUEUED
[  120.265336] ata7.00: cmd 61/00:48:c8:e4:50/0a:00:00:00:00/40 tag 9 ncq 
dma 1310720 ou
	res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  120.265342] ata7.00: status: { DRDY }
[  120.265346] ata7: hard resetting link
[  120.265348] ata7: ahci_hardreset *****
[  120.265352] ahci 0000:0a:00.0: ahci_stop_engine: PORT_IRQ_STAT 0x40000008
PORT_CMD 0x0004c617
[  120.265356] ahci 0000:0a:00.0: ahci_stop_engine: setting HBA to idle 
(setting PxCMD.ST to 0) 
[  120.937522] ahci 0000:0a:00.0: ahci_start_engine: check 
PORT_SCR_ERR 0x00000000, and to clear
[  120.937534] ahci 0000:0a:00.0: ahci_start_engine: check 
PORT_IRQ_STAT 0x40000008, and to clear 
[  120.937544] ata7: ahci_softreset *****
[  120.937550] ahci 0000:0a:00.0: ahci_stop_engine: setting HBA to idle  
(setting PxCMD.ST to 0) 
[  120.937555] ahci 0000:0a:00.0: ahci_start_engine: check 
PORT_SCR_ERR 0x00000000, and to clear
[  120.937559] ahci 0000:0a:00.0: ahci_start_engine: check 
PORT_IRQ_STAT 0x00000000, and to clear 
[  121.097231] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  121.113493] ata7.00: configured for UDMA/133
[  121.113570] ata7: EH complete

I have printk PxIS & PxCMD values to check pending interrupt.
It SBD FIS with ERR to setting PxIS.TFES and PxIS.SDBS bit. 

>>
>>>    >
>>>    > Signed-off-by: Szuying Chen <Chloe_Chen@...edia.com.tw>
>>>    > ---
>>>    >  drivers/ata/libahci.c | 12 ++++++++++++
>>>    >  1 file changed, 12 insertions(+)
>>>    >
>>>    > diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
>>>    > index 06aec35f88f2..0ae51fd95d46 100644
>>>    > --- a/drivers/ata/libahci.c
>>>    > +++ b/drivers/ata/libahci.c
>>>    > @@ -679,9 +679,21 @@ static int ahci_scr_write(struct ata_link *link, unsigned int sc_reg, u32 val)
>>>    >
>>>    >  void ahci_start_engine(struct ata_port *ap)
>>>    >  {
>>>    > +     struct ahci_host_priv *hpriv = ap->host->private_data;
>>>    >       void __iomem *port_mmio = ahci_port_base(ap);
>>>    >       u32 tmp;
>>>    >
>>>    > +     /* clear SError */
>>>    > +     tmp = readl(port_mmio + PORT_SCR_ERR);
>>>    > +     writel(tmp, port_mmio + PORT_SCR_ERR);
>>>    > +
>>>    > +     /* clear port IRQ */
>>>    > +     tmp = readl(port_mmio + PORT_IRQ_STAT);
>>>    > +     if (tmp)
>>>    > +             writel(tmp, port_mmio + PORT_IRQ_STAT);
>>>    > +
>>>    > +     writel(1 << ap->port_no, hpriv->mmio + PORT_IRQ_STAT);
>>>    > +
>>>    >       /* start DMA */
>>>    >       tmp = readl(port_mmio + PORT_CMD);
>>>    >       tmp |= PORT_CMD_START;
>>>    > --
>>>    > 2.39.2
>>>   >
>>>
>>>   --
>>>   Damien Le Moal
>>>    Western Digital Research
>>>
>> Thanks.
>>
>
>-- 
>Damien Le Moal
>Western Digital Research