lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080924150541.GA119@tv-sign.ru>
Date:	Wed, 24 Sep 2008 19:05:41 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Joe Korty <joe.korty@...r.com>
Cc:	Roland McGrath <roland@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP

On 09/23, Oleg Nesterov wrote:
>
> I will be completely offline tomorrow.

Cancelled...

> Better yet, to avoid a possible confusion, could you please send me
> the (modified) source code to re-produce the stall ?

Thanks Joe for the test-case and for your comments.

Joe says:
>
> So it looks like the test is in error, not the kernel.

and I am happy to agree.

I think sigaction/10-1.c should be fixed, please see the patch below.

In essence the code does:

	for (1 .. 10) {
		kill(pid, SIGSTOP);
		wait_until_we_receive_CLD_STOPPED(); // <---- HANGS
		kill(pid, SIGCONT);
	}

We can look at the code from the different angle, this loop does

		kill(pid, SIGCONT);
		kill(pid, SIGSTOP);
		wait_until_we_receive_CLD_STOPPED(); // <---- HANGS

and it hangs because CLD_CONTINUED (which is ignored by the SIGCHLD
handler) masks CLD_STOPPED.

And yes, the mentioned patch makes the difference. To clarify, it
does not defer the "result" of SIGCONT, it only defers the sending
of SIGCHLD with .si_code == CLD_CONTINUED.

So, what happens is:

	kill(SIGCONT) does its job, CLD_CONTINUED will be sent a
	bit later.

	kill(SIGSTOP) starts another group-stop.

	the child sends CLD_CONTINUED, and before the parent deques
	the signal, it stops and sends CLD_STOPPED.

Now, since SIGCHLD is a non-rt signal, the second SIGCHLD is lost,
and wait_until_we_receive_CLD_STOPPED() hangs.

I did the test patch to be sure:

	--- 26-rc2/kernel/signal.c~	2008-09-20 20:37:52.000000000 +0400
	+++ 26-rc2/kernel/signal.c	2008-09-24 18:43:34.000000000 +0400
	@@ -808,7 +808,7 @@ static int send_signal(int sig, struct s
		 * exactly one non-rt signal, so that we can get more
		 * detailed information about the cause of the signal.
		 */
	-	if (legacy_queue(pending, sig))
	+	if (sig != SIGCHLD && legacy_queue(pending, sig))
			return 0;
		/*
		 * fast-pathed signals for kernel-internal things like SIGSTOP

and now your test-case doesn't hang.

Oleg.

--- tmp/posixtestsuite/conformance/interfaces/sigaction/10-1.c~	2003-04-15 01:18:58.000000000 +0400
+++ tmp/posixtestsuite/conformance/interfaces/sigaction/10-1.c	2008-09-24 18:01:23.000000000 +0400
@@ -19,19 +19,41 @@
 #define NUMSTOPS 10
 
 int child_stopped = 0;
-int waiting = 1;
+int child_continued = 0;
+int notification;
 
-void handler(int signo, siginfo_t *info, void *context) 
+void handler(int signo, siginfo_t *info, void *context)
 {
-	if (info && info->si_code == CLD_STOPPED) {
+	if (!info)
+		return;
+
+	notification = info->si_code;
+
+	switch (notification) {
+	case CLD_STOPPED:
 		printf("Child has been stopped\n");
-		waiting = 0;
 		child_stopped++;
+		break;
+	case CLD_CONTINUED:
+		printf("Child has been continued\n");
+		child_continued++;
+		break;
 	}
 }
 
+void wait_for_notification(int val)
+{
+	struct timeval tv;
+
+	while (notification != val) {
+		tv.tv_sec = 1;
+		tv.tv_usec = 0;
+		if (!select(0, NULL, NULL, NULL, &tv))
+			break;
+	}
+}
 
-int main()
+int main(void)
 {
 	pid_t pid;
 	struct sigaction act;
@@ -40,7 +62,7 @@ int main()
 	act.sa_sigaction = handler;
 	act.sa_flags = SA_SIGINFO;
 	sigemptyset(&act.sa_mask);
-	sigaction(SIGCHLD,  &act, 0);     
+	sigaction(SIGCHLD,  &act, 0);
 
 	if ((pid = fork()) == 0) {
 		/* child */
@@ -54,37 +76,36 @@ int main()
 		return 0;
 	} else {
 		/* parent */
-		int s; 		
+		int s;
 		int i;
 
 		for (i = 0; i < NUMSTOPS; i++) {
-			waiting = 1;
-
 			printf("--> Sending SIGSTOP\n");
+			notification = 0;
 			kill(pid, SIGSTOP);
 
 			/*
 			  Don't let the kernel optimize away queued
 			  SIGSTOP/SIGCONT signals.
 			*/
-			while (waiting) {
-				tv.tv_sec = 1;
-				tv.tv_usec = 0;
-				if (!select(0, NULL, NULL, NULL, &tv))
-				  break;
-			}
-
+			wait_for_notification(CLD_STOPPED);
 
 			printf("--> Sending SIGCONT\n");
+			notification = 0;
 			kill(pid, SIGCONT);
+			/*
+			  SIGCHLD doesn't queue, make sure CLD_CONTINUED
+			  doesn't mask the next CLD_STOPPED
+			*/
+			wait_for_notification(CLD_CONTINUED);
 		}
-		
+
 		/* POSIX specifies default action to be abnormal termination */
 		kill(pid, SIGHUP);
 		waitpid(pid, &s, 0);
 	}
 
-	if (child_stopped == NUMSTOPS) {
+	if (child_stopped == NUMSTOPS && child_continued == NUMSTOPS) {
 		printf("Test PASSED\n");
 		return 0;
 	}
@@ -92,4 +113,3 @@ int main()
 	printf("Test FAILED\n");
 	return -1;
 }
-

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ