lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180831152705.4mjm7xo6jq7ptdqn@destiny>
Date:   Fri, 31 Aug 2018 11:27:07 -0400
From:   Josef Bacik <josef@...icpanda.com>
To:     Dennis Zhou <dennisszhou@...il.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Josef Bacik <josef@...icpanda.com>, kernel-team@...com,
        linux-block@...r.kernel.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Jiufei Xue <jiufei.xue@...ux.alibaba.com>,
        Joseph Qi <joseph.qi@...ux.alibaba.com>
Subject: Re: [PATCH 02/15] blkcg: delay blkg destruction until after
 writeback has finished

On Thu, Aug 30, 2018 at 09:53:43PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <dennisszhou@...il.com>
> 
> Currently, blkcg destruction relies on a sequence of events:
>   1. Destruction starts. blkcg_css_offline() is called and blkgs
>      release their reference to the blkcg. This immediately destroys
>      the cgwbs (writeback).
>   2. With blkgs giving up their reference, the blkcg ref count should
>      become zero and eventually call blkcg_css_free() which finally
>      frees the blkcg.
> 
> Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
> and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
> on the completion of all writeback associated with the blkcg. A count of
> the number of cgwbs is maintained and once that goes to zero, blkg
> destruction can follow. This should prevent premature blkg destruction.
> 
> The new process for blkcg cleanup is as follows:
>   1. Destruction starts. blkcg_css_offline() is called which offlines
>      writeback. Blkg destruction is delayed on the nr_cgwbs count to
>      avoid punting potentially large amounts of outstanding writeback
>      to root while maintaining any ongoing policies.
>   2. When the nr_cgwbs becomes zero, blkcg_destroy_blkgs() is called and
>      handles destruction of blkgs. This is where the css reference held
>      by each blkg is released.
>   3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
>      This finally frees the blkg.
> 
> It seems in the past blk-throttle didn't do the most understandable
> things with taking data from a blkg while associating with current. So,
> the simplification and unification of what blk-throttle is doing caused
> this.
> 

So the general approach is correct, but it's sort of confusing because you are
using nr_cgwbs as a reference counter, because it's set at 1 at blkg creation
time regardless of wether or not there's an assocated wb cg.  So instead why not
just have a refcount_t ref, set it to 1 on creation and make the wb cg take a
ref when it's attached, and then just do the get/put like normal and cleanup as
you have below?  What you are doing is a reference counter masquerading as a
count of the wb cg's, just add full ref counting to the blkcg and call it a day,
it'll be much less confusing.  Thanks,

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ