lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Apr 2022 07:37:24 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Bui Quang Minh <minhquangbui99@...il.com>
Cc:     cgroups@...r.kernel.org, kernel test robot <lkp@...el.com>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v2] cgroup: Kill the parent controller when its last
 child is killed

Hello,

On Mon, Apr 04, 2022 at 09:25:34PM +0700, Bui Quang Minh wrote:
> When umounting a cgroup controller, in case the controller has no children,
> the initial ref will be dropped in cgroup_kill_sb. In cgroup_rmdir path,
> the controller is deleted from the parent's children list in
> css_release_work_fn, which is run on a kernel worker.
> 
> With this simple script
> 
> 	#!/bin/sh
> 
> 	mount -t cgroup -o none,name=test test ./tmp
> 	mkdir -p ./tmp/abc
> 
> 	rmdir ./tmp/abc
> 	umount ./tmp
> 
> 	sleep 5
> 	cat /proc/self/cgroup
> 
> The rmdir will remove the last child and umount is expected to kill the
> parent controller. However, when running the above script, we may get
> 
> 	1:name=test:/

First of all, remounting after active use isn't a well-supported use case as
documented in the admin doc. The problem is that there's no finite time
horizon around when all the references are gonna go away - some references
may be held in cache which may not be released unless certain conditions are
met. So, while changing hierarchy configuration is useful for system setup,
development and debugging, for production use, it's a boot time
configuration mechanism.

> This shows that the parent controller has not been killed. The reason is
> after rmdir is completed, it is not guaranteed that the parent's children
> list is empty as css_release_work_fn is deferred to run on a worker. In
> case cgroup_kill_sb is run before that work, it does not drop the initial
> ref. Later in the worker, it just removes the child from the list without
> checking the list is empty to kill the parent controller. As a result, the
> parent controller still has the initial ref but without any logical refs
> (children ref, mount ref).
> 
> This commit adds a free parent controller path into the worker function to
> free up the parent controller when the last child is killed.

And the suggested behavior doesn't make much sense to me. It doesn't
actually solve the underlying problem but instead always make css
destructions recursive which can lead to surprises for normal use cases.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ