linux-kernel - Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ad128a2f-df04-ed1b-f444-24c1754bf80c@oracle.com>
Date:   Tue, 30 May 2023 22:30:37 -0500
From:   Mike Christie <michael.christie@...cle.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Oleg Nesterov <oleg@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
        brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

On 5/29/23 9:38 PM, Eric W. Biederman wrote:
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index b7cbd66f889e..f3ce0fa6edd7 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c

...

>  static int vhost_task_fn(void *data)
>  {
>  	struct vhost_task *vtsk = data;
> -	int ret;
> +	bool dead = false;
> +
> +	for (;;) {
> +		bool did_work;
> +
> +		/* mb paired w/ kthread_stop */
> +		set_current_state(TASK_INTERRUPTIBLE);
> +
> +		if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
> +			__set_current_state(TASK_RUNNING);
> +			break;
> +		}
> +
> +		if (!dead && signal_pending(current)) {
> +			struct ksignal ksig;
> +			/*
> +			 * Calling get_signal will block in SIGSTOP,
> +			 * or clear fatal_signal_pending, but remember
> +			 * what was set.
> +			 *
> +			 * This thread won't actually exit until all
> +			 * of the file descriptors are closed, and
> +			 * the release function is called.
> +			 */
> +			dead = get_signal(&ksig);

Hey Eric, the patch works well. Thanks! There was just one small issue.

get_signal() does try_to_freeze() -> ... __might_sleep() which wants the
state to be TASK_RUNNING.

We just need the patch below on top of yours which I think also cleans up
some of the state setting weirdness with the code like where vhost.c calls
__set_current_state(TASK_RUNNING) for each work. It looks like that was
not needed for any reason like a work->fn changing the state and the old
code could have done:

                node = llist_del_all(&worker->work_list);
                if (!node) {
                        schedule();
			continue;
		} else {
			__set_current_state(TASK_RUNNING);
		}

So I think we can do the following on top of your patch:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 221d1b6c1be5..074273020849 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -346,7 +346,6 @@ static bool vhost_worker(void *data)
 		smp_wmb();
 		llist_for_each_entry_safe(work, work_next, node, node) {
 			clear_bit(VHOST_WORK_QUEUED, &work->flags);
-			__set_current_state(TASK_RUNNING);
 			kcov_remote_start_common(worker->kcov_handle);
 			work->fn(work);
 			kcov_remote_stop();
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index f3ce0fa6edd7..fead2ed29561 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -29,12 +29,8 @@ static int vhost_task_fn(void *data)
 		bool did_work;
 
 		/* mb paired w/ kthread_stop */
-		set_current_state(TASK_INTERRUPTIBLE);
-
-		if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
-			__set_current_state(TASK_RUNNING);
+		if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags))
 			break;
-		}
 
 		if (!dead && signal_pending(current)) {
 			struct ksignal ksig;
@@ -53,8 +49,10 @@ static int vhost_task_fn(void *data)
 		}
 
 		did_work = vtsk->fn(vtsk->data);
-		if (!did_work)
+		if (!did_work) {
+			set_current_state(TASK_INTERRUPTIBLE);
 			schedule();
+		}
 	}
 
 	complete(&vtsk->exited);