lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Jan 2007 03:44:13 +0100
From:	Björn Steinbrink <B.Steinbrink@....de>
To:	Robert Hancock <hancockr@...w.ca>
Cc:	Jeff Garzik <jeff@...zik.org>, Chr <chunkeey@....de>,
	Alistair John Strachan <s0348365@....ed.ac.uk>,
	linux-kernel@...r.kernel.org, htejun@...il.com,
	jens.axboe@...cle.com, lwalton@...l.com, pomac@...or.com
Subject: Re: SATA exceptions with 2.6.20-rc5

On 2007.01.22 19:24:22 -0600, Robert Hancock wrote:
> Björn Steinbrink wrote:
> >>>Running a kernel with the return statement replace by a line that prints
> >>>the irq_stat instead.
> >>>
> >>>Currently I'm seeing lots of 0x10 on ata1 and 0x0 on ata2.
> >>40 minutes stress test now and no exception yet. What's interesting is
> >>that ata1 saw exactly one interrupt with irq_stat 0x0, all others that
> >>might have get dropped are as above.
> >>I'll keep it running for some time and will then re-enable the return
> >>statement to see if there's a relation between the irq_stat 0x0 and the
> >>exception.
> >
> >No, doesn't seem to be related, did get 2 exceptions, but no irq_stat
> >0x0 for ata1. Syslog/dmesg has nothing new either, still the same
> >pattern of dismissed irq_stats.
> 
> I've finally managed to reproduce this problem on my box, by doing:
> 
> watch --interval=0.1 /sbin/hdparm -I /dev/sda
> 
> on one drive and then running bonnie++ on /dev/sdb connected to the 
> other port on the same controller device. Usually within a few minutes 
> one of the IDENTIFY commands would time out in the same way you guys 
> have been seeing.
> 
> Through some various trials and tribulations, the only conclusion I can 
> come to is that this controller really doesn't like that 
> NV_INT_STATUS_CK804 register being looked at in ADMA mode. I tried 
> adding some debug code to the qc_issue function that would check to see 
> if the BUSY flag in altstatus went high or that register showed an 
> interrupt within a certain time afterwards, however that really seemed 
> to hose things, the system wouldn't even boot.

Hm, I don't think it is unhappy about looking at NV_INT_STATUS_CK804.
I'm running 2.6.20-rc5 with the INT_DEV check removed for 8 hours now
without a single problem and that should still look at
NV_INT_STATUS_CK804, right?
I just noticed that my last email might not have been clear enough. The
exceptions happened when I re-enabled the return statement in addition
to the debug message. Without the INT_DEV check, it is completely fine
AFAICT.

> Try out this patch, it just calls the ata_host_intr function where 
> appropriate without using nv_host_intr which looks at the 
> NV_INT_STATUS_CK804 register. This is what the original ADMA patch from 
> Mr. Mysterious NVIDIA Person did, I'm guessing there may be a reason for 
> that. With this patch I can get through a whole bonnie++ run with the 
> repeated IDENTIFY requests running without seeing the error.

I'll see if I can schedule a test run for tomorrow, I currently need
this box.

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ