[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <416dc60a-f0e5-7d05-1613-3cd0ca415768@linaro.org>
Date:   Wed, 1 Jun 2022 17:26:34 -0700
From:   Tadeusz Struk <tadeusz.struk@...aro.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Michal Koutný <mkoutny@...e.com>,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Bui Quang Minh <minhquangbui99@...il.com>
Subject: Re: [PATCH 2/2] cgroup: Use separate work structs on css release path
On 6/1/22 17:07, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jun 01, 2022 at 05:00:44PM -0700, Tadeusz Struk wrote:
>> What I'm trying to say is that it's not really a ref imbalance problem.
>> I think once the kill_css() has been called on a css, and it is enqueued to be
>> killed, we shouldn't call css_put(css) on it anymore outside of the "killed call
>> flow path". It will all be handled by the css_killed_ref_fn() function.
>>
>> The fact the css_release() is called (via cgroup_kn_unlock()) just after
>> kill_css() causes the css->destroy_work to be enqueued twice on the same WQ
>> (cgroup_destroy_wq), just with different function. This results in the
>> BUG: corrupted list in insert_work issue.
> 
> I have a hard time following here. The kill / release paths relationship
> isn't that complicated. The kill path is invoked when the percpu_ref's base
> ref is killed and holds an extra ref so that it's guaranteed that release
> can't happen before the kill path is done with the css. When the final put
> happens - whether that's from the kill path or someone else, which often is
> the case - the release path triggers. If we have release getting scheduled
> while the kill path isn't finished, it is a reference counting problem,
> right?
> 
> Can you elaborate the exact scenario that you think is happening? Please
> feel free to omit the function calls and all that. Just lay out who's doing
> what.
Ok the problem is that
1. kill_css() triggers css_killed_ref_fn(), which enqueues &css->destroy_work on cgroup_destroy_wq
2. Last put_css() calls css_release(), which enqueues &css->destroy_work on cgroup_destroy_wq
We have two instances of the same work struct enqueued on the same WQ (cgroup_destroy_wq),
which causes "BUG: corrupted list in insert_work"
So I think the easiest way to solve this would be to have two separate work_structs,
one for the killed_ref path and css_release path as in:
https://lore.kernel.org/all/20220523212724.233314-1-tadeusz.struk@linaro.org/
-- 
Thanks,
Tadeusz
Powered by blists - more mailing lists
 
