lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 15 Mar 2007 20:00:47 +0800
From:	"Conke Hu" <conke.hu@...il.com>
To:	"Tejun Heo" <htejun@...il.com>
Cc:	"Jeff Garzik" <jeff@...zik.org>, Alan <alan@...rguk.ukuu.org.uk>,
	"Andrew Morton" <akpm@...l.org>,
	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

On 3/14/07, Tejun Heo <htejun@...il.com> wrote:
> Hello,
>
> Conke Hu wrote:
> >    When there is no media in SATA CD/DVD drive or media is not ready,
> > AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
> > READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
> > SATA controller sets SERR_INTERNAL bit in the error register at the
> > same time, which is not necessary. This patch is just to ignore the
> > INTERNAL ERROR in such case. Without this patch, ahci error handler
> > will report many errors as below:
>
> Whoa, SERR_INTERNAL on ATAPI check condition?  Just for fun, here's what
> the spec says about SERR_INTERNAL.
>

When media is not ready, command TEST_UNIT_READY fails with ahci irq
status == 0x40000001(IRQ_TF_ERROR) and serror == SERR_INTERNEL, then
ahci error handler calls atapi_eh_request_sense() and sets
ATA_QCFLAG_SENSE_VALID. Command REQUEST_SENSE executes successfully
and atapi_qc_complete() sets result = SAM_STAT_CHECK_CONDITION, and
now the whole TEST_UNIT_READY request is done and returns .


> E  Internal error: The host bus adapter experienced an internal error
> that caused the operation to fail and may have put the host bus adapter
> into an error state. Host software should reset the interface before
> re-trying the operation. If the condition persists, the host bus adapter
> may suffer from a design issue rendering it incompatible with the
> attached device.
>

Yes, I saw this too :) and I am contacting the hardware engineers to
check if there is any hardware bug.
But, even though this were a hardware bug and could be fixed, we would
still need this patch since many SB600 boards have already come into
the market and those ASICs can never be fixed :(
So, if no errors in this patch, could Jeff please apply it ASAP?


> Anyways thanks for fixing this.  Just a few comments.
>
> > --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig    2007-03-25
> > 20:57:31.000000000 +0800
> > +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c    2007-03-25
> > 21:03:54.000000000 +0800
> > @@ -80,6 +80,7 @@ enum {
> >     board_ahci_pi        = 1,
> >     board_ahci_vt8251    = 2,
> >     board_ahci_ign_iferr    = 3,
> > +    board_ahci_ati    = 4,
> >
> >     /* global controller registers */
> >     HOST_CAP        = 0x00, /* host capabilities */
> > @@ -168,6 +169,7 @@ enum {
> >     AHCI_FLAG_NO_NCQ        = (1 << 24),
> >     AHCI_FLAG_IGN_IRQ_IF_ERR    = (1 << 25), /* ignore IRQ_IF_ERR */
> >     AHCI_FLAG_HONOR_PI        = (1 << 26), /* honor PORTS_IMPL */
> > +    AHCI_FLAG_TF_ERR_FIX    = (1 << 27), /* ignore INTERNAL ERROR on
> > IRQ_TF_ERROR */
>
> Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
> it more consistent with the other IGN flag?
>
> > -    { PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
> > +    { PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
> >     { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */
>
> 4381 isn't affected while 4380 is?

I never see such an ID, and plan to remove 0x4381.
The patch which added the PCI IDs was not sent out by myself. I
checked all SB600 boards, and not found any 0x4381 controller, only
0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
device ID, only with class code different.


>
> Hmmm... Okay, this is weird.  I'm feeling very strong deja vu.
>
> Well, I must be getting alzheimer.  I did almost the same patch a month
> ago and was waiting for verification to properly submit the patch.
>
>   http://thread.gmane.org/gmane.linux.ide/16049/focus=16437
>
> Anyways, Conke Hu, can you please take a look at my patch from a month
> ago?  It's almost identical but SERR_INTERNAL is always ignored on both
> SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?
>

I just read your patch. Another difference is that my patch ignores
SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
The following is some detail:
// your patch:
+	if (ap->flags & AHCI_FLAG_IGN_SERR_INTERNAL)
+		serr &= ~SERR_INTERNAL;

// mine:
-       if (irq_stat & PORT_IRQ_TF_ERR)
+       if (irq_stat & PORT_IRQ_TF_ERR) {
               err_mask |= AC_ERR_DEV;
+
+               /* some controllers set INTERNAL ERROR on ATAPI
IRQ_TF_ERROR, ignore it */
+               if ((serror & SERR_INTERNAL) &&
+                    (ap->flags & AHCI_FLAG_TF_ERR_FIX) &&
+                     qc && qc->dev->class == ATA_DEV_ATAPI) {
+                       serror &= ~SERR_INTERNAL;
+               }
+       }

Tejun, you do me a great favor, thank you so much! for your previous
help, too :)

Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ