linux-kernel - Re: [PATCH 2/2] cgroup: Use separate work structs on css release path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220602114705.GB21320@blackbody.suse.cz>
Date:   Thu, 2 Jun 2022 13:47:05 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Tadeusz Struk <tadeusz.struk@...aro.org>
Cc:     Tejun Heo <tj@...nel.org>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Bui Quang Minh <minhquangbui99@...il.com>
Subject: Re: [PATCH 2/2] cgroup: Use separate work structs on css release path

On Wed, Jun 01, 2022 at 05:40:51PM -0700, Tadeusz Struk <tadeusz.struk@...aro.org> wrote:
> css_killed_ref_fn() will be called regardless of the value of refcnt (via percpu_ref_kill_and_confirm())
> and it will only enqueue the css_killed_work_fn() to be called later.
> Then css_put()->css_release() will be called before the css_killed_work_fn() will even
> get a chance to run, and it will also *only* enqueue css_release_work_fn() to be called later.
> The problem happens on the second enqueue. So there need to be something in place that
> will make sure that css_killed_work_fn() is done before css_release() can enqueue
> the second job.

IIUC, here you describe the same scenario I broke down at [1].

> Does it sound right?

I added a parameter A there (that is sum of base and percpu references
before kill_css()).
I thought it fails because A == 1 (i.e. killing the base reference),
however, that seems an unlikely situation (because cgroup code uses a
"fuse" reference to pin css for offline_css()).

So the remaining option (at least I find it more likely now) is that
A == 0 (A < 0 would trigger the warning in
percpu_ref_switch_to_atomic_rcu()), aka the ref imbalance. I hope we can
get to the bottom of this with detailed enough tracing of gets/puts.

Splitting the work struct is condradictive to the existing approach with
the "fuse" reference.

(BTW you also wrote On Wed, Jun 01, 2022 at 05:00:44PM -0700, Tadeusz Struk <tadeusz.struk@...aro.org> wrote:
> The fact the css_release() is called (via cgroup_kn_unlock()) just after
> kill_css() causes the css->destroy_work to be enqueued twice on the same WQ
> (cgroup_destroy_wq), just with different function. This results in the
> BUG: corrupted list in insert_work issue.

Where do you see a critical css_release called from cgroup_kn_unlock()?
I always observed the css_release() being called via
percpu_ref_call_confirm_rcu() (in the original and subsequent syzbot
logs.))

Thanks,
Michal

[1] https://lore.kernel.org/r/Yo7KfEOz92kS2z5Y@blackbook/