lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140321183443.GA3891@redhat.com>
Date:	Fri, 21 Mar 2014 19:34:43 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Joseph Salisbury <joseph.salisbury@...onical.com>
Cc:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	JBottomley@...allels.com, Nagalakshmi.Nandigama@....com,
	Sreekanth.Reddy@....com, rientjes@...gle.com,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	tj@...nel.org, tglx@...utronix.de, linux-kernel@...r.kernel.org,
	kernel-team@...ts.ubuntu.com, linux-scsi@...r.kernel.org,
	Tomas Henzl <thenzl@...hat.com>
Subject: Re: please fix FUSION (Was: [v3.13][v3.14][Regression]
	kthread:makekthread_create()killable)

On 03/20, Oleg Nesterov wrote:
>
> On 03/20, Joseph Salisbury wrote:
> >
> > There was some testing done with your test kernel.  The data collected
> > while running your kernel is available in the bug report:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58
>
> Joseph, thanks a lot.
>
> I'll try to read the logs tomorrow, but at first glance Tetsuo was right,
> I do not see a "long" sleep in that log.

Yes, it seems that it actually needs > 30 secs. It spends most of the time
(30.13286 seconds) in

	msleep+0x20/0x30
	WaitForDoorbellInt+0x103/0x130 [mptbase]
	WaitForDoorbellReply+0x42/0x220 [mptbase]
	mpt_handshake_req_reply_wait+0x1dc/0x2c0 [mptbase]
	SendPortEnable.constprop.23+0x94/0xc0 [mptbase]

WaitForDoorbellInt() does msleep(1) in a loop. This trace starts at the line
6001, and it is repeated 3792 times, till the line 176686 which apparently
shows the trace of the 2nd WaitForDoorbellInt() in WaitForDoorbellReply().

SendPortEnable:

	if (ioc->ir_firmware || ioc->bus_type == SAS) {
		rc = mpt_handshake_req_reply_wait(ioc, req_sz,
		(u32*)&port_enable, reply_sz, (u16*)&reply_buf,
		300 /*seconds*/, sleepFlag);
	} else {
		rc = mpt_handshake_req_reply_wait(ioc, req_sz,
		(u32*)&port_enable, reply_sz, (u16*)&reply_buf,
		30 /*seconds*/, sleepFlag);
	}

I am wondering which branch calls mpt_handshake_req_reply_wait(), the
else's timeout=30 (passed to the 1st WaitForDoorbellInt) suspiciously
matches the time WaitForDoorbellInt() actually runs... But everything
works fine and at first glance the potential timeout error should be
propogated correctly. So "timeout" is probably 300. And probably this
all is fine.

All I can suggest is the dirty hack for now. User-space should be
changed too, I think, but this is another story.

Tetsuo, what do you think?

Oleg.
---


--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -5395,6 +5395,8 @@ static struct pci_driver mptsas_driver = {
 #endif
 };
 
+#include <linux/signal.h>
+
 static int __init
 mptsas_init(void)
 {
@@ -5424,7 +5426,31 @@ mptsas_init(void)
 	mpt_event_register(mptsasDoneCtx, mptsas_event_process);
 	mpt_reset_register(mptsasDoneCtx, mptsas_ioc_reset);
 
-	error = pci_register_driver(&mptsas_driver);
+	{
+		sigset_t full, save;
+		/*
+		 * KILL ME. THIS IS THE DIRTY AND WRONG HACK WAITING FOR THE
+		 * FIX FROM MAINTAINERS.
+		 *
+		 * - This driver needs a lot of time to complete SendPortEnable()
+		 *   but systemd-udevd sends SIGKILL after 30 seconds, see
+		 *   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
+		 *
+		 *   Probably user-space should be changed, but:
+		 *
+		 * - Since commit 786235eeba0e "kthread: make kthread_create()
+		 *   killable" scsi_host_alloc() becomes killable and this SIGKILL
+		 *   crashes the kernel.
+		 *
+		 *   If scsi_host_alloc() fails mptsas_probe() blindly calls
+		 *   mptscsih_remove() and scsi_remove_host() hits host == NULL.
+		 */
+		sigfillset(&full);
+		sigprocmask(SIG_SETMASK, &full, &save);
+		error = pci_register_driver(&mptsas_driver);
+		sigprocmask(SIG_SETMASK, &save, NULL);
+	}
+
 	if (error)
 		sas_release_transport(mptsas_transport_template);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ