lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090904172316.GA6076@clala-laptop>
Date:	Fri, 4 Sep 2009 10:23:16 -0700
From:	Chaitanya Lala <clala@...erbed.com>
To:	tj@...nel.org
Cc:	clala@...erbed.com, rbecker@...erbed.com,
	linux-kernel@...r.kernel.org
Subject: Disk failure behavior

Hi,

I am using a back-port of libata from ~ 2.6.20 on a 2.6.9
Red Hat kernel. I have SATA disks (using AHCI) in the 
system which are hot-pluggable. The problem I am facing
is that, certain disk failures bring the system into a
weird state. The system tries to reset the disk but fails.
Finally it prints a message "reset failed, giving up."

At this point the port is left in a frozen state and
the interrupts from the port are masked. If now, this disk is
pulled out and a healthy disk is inserted, the new disk's
insertion does not raise any event/notification/interrupt.
In fact, the only way at this point to get the disk to work is
to reboot.

Below is a snippet of the code, I am referring to, from v2.6.20.
File - drivers/ata/libata-eh.c & function-name -  ata_eh_recover
 
	/* reset */
	if (ehc->i.action & ATA_EH_RESET_MASK) {
		ata_eh_freeze_port(ap);

		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
				  softreset, hardreset, postreset);
		if (rc) {
			ata_port_printk(ap, KERN_ERR,
					"reset failed, giving up\n");
			goto out; 
		}    

		ata_eh_thaw_port(ap);
	} 

A possible work-around is to thaw the port before going to "out".
That would enable the interrupts again before going to "out".
I understand that would enable future interrupts from the old disk as well,
but I am willing to live with that, if it helps to detect the new device.

	/* reset */
	if (ehc->i.action & ATA_EH_RESET_MASK) {
		ata_eh_freeze_port(ap);

		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
				  softreset, hardreset, postreset);
		if (rc) {
			ata_port_printk(ap, KERN_ERR,
					"reset failed, giving up\n");
+			ata_eh_thaw_port(ap);
			goto out; 
		}    

		ata_eh_thaw_port(ap);
	} 

I have tested this successfully. But I would like to ask you if this would
possibly "break" some other functionality ? I am new to the kernel ata stuff
and want to be sure before I use this.

Thanks,
Chaitanya

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ