lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <53d4b30e-2563-c13b-eadc-8372ae965fcb@redhat.com>
Date:   Tue, 22 Feb 2022 09:35:04 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Vipin Sharma <vipinsh@...gle.com>, seanjc@...gle.com
Cc:     mkoutny@...e.com, tj@...nel.org, lizefan.x@...edance.com,
        hannes@...xchg.org, dmatlack@...gle.com, jiangshanlai@...il.com,
        kvm@...r.kernel.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] KVM: Move VM's worker kthreads back to the original
 cgroup before exiting.

On 2/22/22 06:48, Vipin Sharma wrote:
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM terminates the VM process.
> 
> KVM terminates the worker kthreads by calling kthread_stop() which waits
> on the 'exited' completion, triggered by exit_mm(), via mm_release(), in
> do_exit() during the kthread's exit.  However, these kthreads are
> removed from the cgroup using the cgroup_exit() which happens after the
> exit_mm(). Therefore, A VM process can terminate in between the
> exit_mm() and cgroup_exit() calls, leaving only worker kthreads in the
> cgroup.
> 
> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that the cgroup is empty as soon as the main VM
> process is terminated.
> 
> Signed-off-by: Vipin Sharma <vipinsh@...gle.com>
> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> ---

Queued, thanks.

Paolo

> Thanks Sean, for the example on how to use the real_parent outside of the RCU
> critical region. I wrote your name in Suggested-by, I hope you are fine with
> it and this is the right tag/way to give you the credit.
> 
> v4:
> - Read task's real_parent in the RCU critical section.
> - Don't log error message from the cgroup_attach_task_all() API.
> 
> v3: https://lore.kernel.org/lkml/20220217061616.3303271-1-vipinsh@google.com/
> - Use 'current->real_parent' (kthreadd_task) in the
>    cgroup_attach_task_all() call.
> - Revert cgroup APIs changes in v2. Now, patch does not touch cgroup
>    APIs.
> - Update commit and comment message
> 
> v2: https://lore.kernel.org/lkml/20211222225350.1912249-1-vipinsh@google.com/
> - Use kthreadd_task in the cgroup API to avoid build issue.
> 
> v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@google.com/
> 
>   virt/kvm/kvm_main.c | 22 +++++++++++++++++++++-
>   1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 83c57bcc6eb6..cdf1fa3c60ae 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5810,6 +5810,7 @@ static int kvm_vm_worker_thread(void *context)
>   	 * we have to locally copy anything that is needed beyond initialization
>   	 */
>   	struct kvm_vm_worker_thread_context *init_context = context;
> +	struct task_struct *parent;
>   	struct kvm *kvm = init_context->kvm;
>   	kvm_vm_thread_fn_t thread_fn = init_context->thread_fn;
>   	uintptr_t data = init_context->data;
> @@ -5836,7 +5837,7 @@ static int kvm_vm_worker_thread(void *context)
>   	init_context = NULL;
>   
>   	if (err)
> -		return err;
> +		goto out;
>   
>   	/* Wait to be woken up by the spawner before proceeding. */
>   	kthread_parkme();
> @@ -5844,6 +5845,25 @@ static int kvm_vm_worker_thread(void *context)
>   	if (!kthread_should_stop())
>   		err = thread_fn(kvm, data);
>   
> +out:
> +	/*
> +	 * Move kthread back to its original cgroup to prevent it lingering in
> +	 * the cgroup of the VM process, after the latter finishes its
> +	 * execution.
> +	 *
> +	 * kthread_stop() waits on the 'exited' completion condition which is
> +	 * set in exit_mm(), via mm_release(), in do_exit(). However, the
> +	 * kthread is removed from the cgroup in the cgroup_exit() which is
> +	 * called after the exit_mm(). This causes the kthread_stop() to return
> +	 * before the kthread actually quits the cgroup.
> +	 */
> +	rcu_read_lock();
> +	parent = rcu_dereference(current->real_parent);
> +	get_task_struct(parent);
> +	rcu_read_unlock();
> +	cgroup_attach_task_all(parent, current);
> +	put_task_struct(parent);
> +
>   	return err;
>   }
>   
> 
> base-commit: 1bbc60d0c7e5728aced352e528ef936ebe2344c0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ