lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Apr 2022 11:11:58 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Bui Quang Minh <minhquangbui99@...il.com>, cgroups@...r.kernel.org,
        kernel test robot <lkp@...el.com>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v2] cgroup: Kill the parent controller when its last
 child is killed

On Mon, Apr 04, 2022 at 07:37:24AM -1000, Tejun Heo <tj@...nel.org> wrote:
> And the suggested behavior doesn't make much sense to me. It doesn't
> actually solve the underlying problem but instead always make css
> destructions recursive which can lead to surprises for normal use cases.

I also don't like the nested special-case use percpu_ref_kill().

I looked at this and my supposed solution turned out to be a revert of
commit 3c606d35fe97 ("cgroup: prevent mount hang due to memory
controller lifetime"). So at the unmount time it's necessary to distinguish
children that are in the process of removal from children than are online or
pinned indefinitely.

What about:

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2205,11 +2205,14 @@ static void cgroup_kill_sb(struct super_block *sb)
        struct cgroup_root *root = cgroup_root_from_kf(kf_root);

        /*
-        * If @root doesn't have any children, start killing it.
+        * If @root doesn't have any children held by residual state (e.g.
+        * memory controller), start killing it, flush workqueue to filter out
+        * transiently offlined children.
         * This prevents new mounts by disabling percpu_ref_tryget_live().
         *
         * And don't kill the default root.
         */
+       flush_workqueue(cgroup_destroy_wq);
        if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
            !percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
                cgroup_bpf_offline(&root->cgrp);

(I suspect there's technically still possible a race between concurrent unmount
and the last rmdir but the flush on kill_sb path should be affordable and it
prevents unnecessarily conserved cgroup roots.)

Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ