linux-kernel - Re: [PATCH] sata_nv: fix nmi intr or system hanging in rhel4u6 adma.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47C43E27.5070700@garzik.org>
Date:	Tue, 26 Feb 2008 11:28:23 -0500
From:	Jeff Garzik <jeff@...zik.org>
To:	Robert Hancock <hancockr@...w.ca>, Kuan Luo <kluo@...dia.com>
CC:	linux-kernel <linux-kernel@...r.kernel.org>,
	Tejun Heo <htejun@...il.com>, Peer Chen <pchen@...dia.com>
Subject: Re: [PATCH] sata_nv: fix nmi intr or system hanging in rhel4u6 adma.

Robert Hancock wrote:
> 
> 
> Kuan Luo wrote:
>> Hi, robert
>> One customer reported that their system received a nmi interrupt after
>> issuing "dd if=/dev/sdb of=/dev/null" on a defective disk in rhel4u6.
>> I tested it and found  that my system hung both in rhel4u6(2.6.9-67) and
>> 2.6.24-rc7.
>> The patch can work well,  but I am not sure if the patch has other
>> potential effect on adma.
>> I attached a  file in case of lines breaked.
>>
>> The below info comes from Gunther Mayer to reproduce the issue.
>> "
>> used a Seagate ST3500841NS 3.AE for my test; probably other seagate 
>> drives are also capable of creating media errors with the new hdparm-8.1:
>> - compile hdparm-8.1 - hdparm -- yes-i-know-what-i-am-doing 
>> --make-bad-sector 60000 /dev/sdb
>> Unfortunately this does not succeed for nvidia sata controller (timeouts
>> et al.), but it worked fine on AHCI machine (e.g. FSC R640).
>> When I insert this newly created defective disk in Ultra 20, it 
>> reboots within seconds after issueing "dd if=/dev/sdb of=/dev/null". "
>>
>> Signed-off-by: kluo@...dia.com
>>
>> ---
>>  
>> drivers/ata/sata_nv.c |    5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
>> index ed5473b..e824260 100644
>> --- a/drivers/ata/sata_nv.c
>> +++ b/drivers/ata/sata_nv.c
>> @@ -837,9 +837,10 @@ static void nv_adma_tf_read(struct ata_port *ap,
>> struct ata_taskfile *tf)
>>         all shortly be aborted anyway. We assume that NCQ commands
>> are not
>>         issued via passthrough, which is the only way that switching
>> into
>>         ADMA mode could abort outstanding commands. */
>> -    nv_adma_register_mode(ap);
>> +    struct nv_adma_port_priv *pp = ap->private_data;
>>  
>> -    ata_tf_read(ap, tf);
>> +    if (pp->flags & NV_ADMA_PORT_REGISTER_MODE)
>> +        ata_tf_read(ap, tf);
>>  }
>>  
>>  static unsigned int nv_adma_tf_to_cpb(struct ata_taskfile *tf, __le16
>> *cpb)
> 
> This is basically avoiding switching into register mode, right? I don't 
> think this is a very good solution as the point of the tf_read function 
> is that it's supposed to read the taskfile provided by the drive to 
> diagnose the error, so not doing this isn't a good thing.

Agree with this analysis -- if ->tf_read() is being called, then 
obviously the core wants a current copy of the device's ATA registers.

It is not a good solution to simply avoiding returning meaningful data, 
because -- as Robert notes -- we need tf_read for analysis.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/