lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071101164046.462f40f0.akpm@linux-foundation.org>
Date:	Thu, 1 Nov 2007 16:40:46 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Max Krasnyansky <maxk@...lcomm.com>
Cc:	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Strange freezes (seems like SATA related)

On Mon, 29 Oct 2007 09:54:27 -0700
Max Krasnyansky <maxk@...lcomm.com> wrote:

> A couple of HP xw9300 machines (dual Opterons) started freezing up.
> We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive
> (I can switch vts, etc) but everything else is dead (network, etc).
> Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff.
> 
> Hooked up serial console and the only error that shows up is this.
> 
> ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0
> ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Descriptor sense data with sense descriptors (in hex):
> end_request: I/O error, dev sda, sector 8388695
> Buffer I/O error on device sda1, logical block 1048579
> lost page write due to I/O error on sda1
> sd 0:0:0:0: [sda] Write Protect is off
> 
> I see a bunch of those and then the box just sits there spewing this periodically
> 
> ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0
> ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> 
> SMART selftest on the drive passed without errors.
> 
> Here is how this machine looks like
> 
> ...

So this happens on more than one machine?

The kernel shouldn't freeze, so even if both machines have magically
identical hardware faults, there's a kernel bug there somewhere.

I guess it would be useful to test a 2.6.23 kernel if poss.  We've seen a
very large number of reports like this one in recent months (many of which
have not been responded to, btw) and perhaps someone has done something
about them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ