lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 27 Aug 2007 10:28:09 +0100
From:	"Dermot Bradley" <dermot.bradley@...-mobile.com>
To:	"Alistair John Strachan" <alistair@...zero.co.uk>,
	"Alan Cox" <alan@...rguk.ukuu.org.uk>
Cc:	<linux-kernel@...r.kernel.org>
Subject: RE: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives

> FWIW, I've got the HDMI version of this board and I have exactly the
same
> problem (even with the newest BIOS) if nmi_watchdog is not set to
zero.
> Try booting with nmi_watchdog=0 (default on x86-64, I think) and see
if
> these go away.
>
> I guess the APIC has some difficulties handling NMIs.

On Friday I had rebooted the box (remotely) after adding "noapic" to the
boot options and the machine didn't come back up - came in this morning
to see a kernel panic after the reboot to do with nmi_watchdog.

This morning I've removed the "noapic" option and instead set
"nmi_watchdog=0" and its booted and been up and running for 30 minutes
with no APCI whereas these happened every thing previously during boot.

So nmi_watch does indeed seem to related to the root cause of the
problem. 

Thanks for the help Alistair! One other point you may be able to help
with - this is the first time I've used a dual core processor and I
expected that /proc/interrupts would should interrupts distributed
between both cores whereas they actually seem to be mainly handled by
the 1st core:

           CPU0       CPU1
  0:        251          0   IO-APIC-edge      timer
  1:       2208         11   IO-APIC-edge      i8042
  8:          0          1   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 16:       5291          3   IO-APIC-fasteoi   ehci_hcd:usb1, eth0
 17:     223026         13   IO-APIC-fasteoi   ahci
 18:          0          1   IO-APIC-fasteoi   ohci_hcd:usb2
 19:          0        126   IO-APIC-fasteoi   ohci_hcd:usb3,
ohci_hcd:usb5
 20:          0          2   IO-APIC-fasteoi   ohci_hcd:usb4,
ohci_hcd:usb6
 21:     188591          1   IO-APIC-fasteoi   HFC-multi
NMI:          0          0
LOC:    3036393    1527288
ERR:          0
MIS:          0

Is this to be expected for dual core systems?

> I get the feeling this problem is independent of the APIC errors, and
I
> don't see it here

Yeah, I thought as much. After disabling NMI Watchdog I didn't see these
NCQ messages initially but after running a few bonnie runs to load the
disks I'm now getting them again.

> As Alan said, it's very possibly just the drive not properly
supporting
> NCQ.

Yeah that seems the case - last Friday these errors seemed to be
occurring every 1-2 hours or so even when I wasn't using the machine.

I'm currently rebuilding the kernel with the following patch to disable
NCQ for these drives:

--- linux-2.6.22.5/drivers/ata/libata-core.c    2007-08-23
00:23:54.000000000 +0100
+++ linux-2.6.22.5.new/drivers/ata/libata-core.c        2007-08-27
10:25:16.000000000 +0100
@@ -3788,6 +3788,7 @@
        /* NCQ is broken */
        { "Maxtor 6L250S0",     "BANC1G10",     ATA_HORKAGE_NONCQ },
        { "Maxtor 6B200M0",     "BANC1B10",     ATA_HORKAGE_NONCQ },
+        { "WDC WD800BEVS-07",   NULL,          ATA_HORKAGE_NONCQ },
        /* NCQ hard hangs device under heavier load, needs hard power
cycle */
        { "Maxtor 6B250S0",     "BANC1B70",     ATA_HORKAGE_NONCQ },
        /* Blacklist entries taken from Silicon Image 3124/3132




Stirk, Lamont & Associates Ltd.
Registered Address: Thomas Andrews House, Queens Road, Belfast,  BT3 9DU
Registered in Northern Ireland, Number: NI 47983. VAT Number: 832 2778 22

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists