[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230605123844.GA32275@redhat.com>
Date: Mon, 5 Jun 2023 14:38:45 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Mike Christie <michael.christie@...cle.com>, linux@...mhuis.info,
nicolas.dichtel@...nd.com, axboe@...nel.dk,
torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, jasowang@...hat.com, stefanha@...hat.com,
brauner@...nel.org
Subject: Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
regression
On 06/02, Eric W. Biederman wrote:
>
> static int vhost_task_fn(void *data)
> {
> struct vhost_task *vtsk = data;
> - int ret;
> + bool dead = false;
> +
> + for (;;) {
> + bool did_work;
> +
> + if (!dead && signal_pending(current)) {
> + struct ksignal ksig;
> + /*
> + * Calling get_signal can block in SIGSTOP,
> + * and the freezer. Or it can clear
> + * fatal_signal_pending and return non-zero.
> + */
> + dead = get_signal(&ksig);
> + if (dead)
> + set_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags);
> + }
> +
> + /* mb paired w/ kthread_stop */
> + set_current_state(TASK_INTERRUPTIBLE);
> +
> + did_work = vtsk->fn(vtsk->data);
I don't understand why do you set TASK_INTERRUPTIBLE before vtsk->fn(),
it seems that you could do this before the test_bit(FLAGS_STOP) below.
But probably I missed something and this is minor anyway...
> + if (!did_work) {
> + if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
> + __set_current_state(TASK_RUNNING);
> + break;
What if VHOST_TASK_FLAGS_STOP was set by us after get_signal() above ?
We need to ensure that in this case vhost_work_queue() can't add a new work,
nobody will flush it.
In fact, unless I missed something this can even race with vhost_dev_flush().
vhost_dev_flush: vhost_task_fn:
checks FLAGS_STOP, not set,
vhost_task_flush() returns false
gets SIGKILL, sets FLAGS_STOP
vtsk->fn() returns false
vhost_task_fn() exits.
vhost_work_queue();
wait_for_completion(&flush.wait_event);
and the last wait_for_completion() will hang forever.
Oleg.
Powered by blists - more mailing lists