lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C1438B59050E1B4C9482FF3266AD6BA32D8DA54B00@gretna.indigovision.com>
Date:	Tue, 6 Sep 2011 13:19:44 +0100
From:	Bruce Stenning <b.stenning@...igovision.com>
To:	Tejun Heo <htejun@...il.com>
CC:	Mark Lord <kernel@...savvy.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
	Jeff Garzik <jgarzik@...ox.com>
Subject: RE: sata_mv port lockup on hotplug (kernel 2.6.38.2)

> Can you please add some debug printk's to scsi_schedule_eh() and see
> whether scsi_eh_wakeup() is invoked from there?  It seems likely that
> the problem is caused by race conditions around
> SHOST_[CANCEL_]RECOVERY flags.

I did manage to reproduce the lockup again yesterday with a slightly
different mix of tracing, including adding tracing to scsi_eh_wakeup()
and scsi_schedule_eh().  It looks like the EH is being scheduled, but
the EH thread goes immediately back to sleep and doesn't wake up:

ata4: EH complete
Waking error handler thread
scsi_eh_wakeup: succeeded
scsi_schedule_eh: succeeded
scsi_restart_operations: waking up host to restart
Error handler scsi_eh_3 sleeping

Is it attempting to wake the scsi_eh_3 thread while scsi_error_handler
is still processing an EH, which then calls scsi_restart_operations and
puts the scsi_eh_3 thread back to sleep again?

Some while after the lockup, there was some tracing relating to SCSI
operations timing out, but the port was still unresponsive.  The unit
is not entirely stable in this state, and our application software was
no longer able to strobe softdog, so the unit rebooted.  Enough was
running for the serial console to be responsive before the reboot,
however.

Thanks,

Bruce.


Latest News at: http://www.indigovision.com/index.php/en/news.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ