lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 1 Jul 2010 09:28:24 -0500
From:	scameron@...rdog.cce.hp.com
To:	zhanglinbao@...il.com
Cc:	randy.dunlap@...cle.com, bob_zhang2004@....com, axboe@...nel.dk,
	mike.miller@...com, iss_storagedev@...com,
	linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
	james.bottomley@...senpartnership.com, scameron@...rdog.cce.hp.com,
	mikem@...rdog.cce.hp.com
Subject: Re: cciss: WARNING/BUG in do_cciss_intr (it's back)

Bob Zhang wrote:

> Hi all,
> 
> I want to know the final result.
> have you fixed this bug ?  if yes, how to fix ?
> Now , I am using 2.6.32.12-7 from sles11SP1(ia64) , I still happened
> this problem.
> 
> 
> Any comments are welcome .
> 
> another point ,
> >> Randy,
> >> I think this is a different bug than the one you reported previously.
> >> Please open a new bugzilla.
> >
> > I think it's the same one. The first warning that now triggers is:
> >
> Could you give me the previous one link ?
> 
> attachment is the booting information and eror.

( See: http://lkml.org/lkml/2009/2/4/342 for a bit more context )

and Jens Axboe wrote, back in Feb of 2009:

> I think it's the same one. The first warning that now triggers is:
> 
> WARNING: at drivers/block/cciss.c:225 
> 
> which is
> 
>         if (WARN_ON(hlist_unhashed(&c->list)))
> removeQ(), this is where we would have crashed before due to trying to
> remove a command from a list it didn't belong to. And then we crash
> right after in the interrupt handler. So I'm pretty sure this is 100%
> the same bug.
> 

I did not see a similar error in the log file you provided.

The above problem appeared to be triggered by the reset_devices path (e.g. kdump) picking
up completions from the previous kernel, due to the device not actually being reset.
All the Smart arrays since the p600 can't be reset by the PCI power management
method.  Some of them can be reset by using the "doorbell" register, and a patch
for hpsa to do this has been implemented, this one:

http://marc.info/?l=linux-scsi&m=127671403229420&w=2

which is one patch in a series of other patches to hpsa.

I am currently working on a similar series of patches for cciss.

However, this won't help the P400, P400i, E500, P800, and P700m, which cannot
be reset by either method.  Also, the 6402 and 6404, while they can
be reset, it's inadvisable since they share a battery backed cache 
module, hence this patch to hpsa:

http://marc.info/?l=linux-scsi&m=127671403029407&w=2

See also: https://bugzilla.redhat.com/show_bug.cgi?id=609522
and https://bugzilla.redhat.com/show_bug.cgi?id=598681
(you need an account to see those, I think.)

-- steve
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ