lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 26 May 2023 08:34:04 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Waiman Long <longman@...hat.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
        Josef Bacik <josef@...icpanda.com>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        Michal Koutný <mkoutny@...e.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Jay Shin <jaeshin@...hat.com>, stable@...r.kernel.org,
        ming.lei@...hat.com
Subject: Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq

On Thu, May 25, 2023 at 12:06:07PM -0400, Waiman Long wrote:
> On 5/25/23 12:01, Waiman Long wrote:
> > As noted by Michal, the blkg_iostat_set's in the lockless list hold
> > reference to blkg's to protect against their removal. Those blkg's
> > hold reference to blkcg. When a cgroup is being destroyed,
> > cgroup_rstat_flush() is only called at css_release_work_fn() which
> > is called when the blkcg reference count reaches 0. This circular
> > dependency will prevent blkcg and some blkgs from being freed after
> > they are made offline.
> > 
> > It is less a problem if the cgroup to be destroyed also has other
> > controllers like memory that will call cgroup_rstat_flush() which will
> > clean up the reference count. If block is the only controller that uses
> > rstat, these offline blkcg and blkgs may never be freed leaking more
> > and more memory over time.
> > 
> > To prevent this potential memory leak:
> > 
> > - flush blkcg per-cpu stats list in __blkg_release(), when no new stat
> >    can be added to avoid use-after-free of the percpu blkg_iostat_set in
> >    futue cgroup_rstat_flush*() calls.
> > 
> > - add a cgroup_rstat_flush_acquire() helper and call it to acquire
> >    cgroup_rstat_lock to block concurrent execution of other
> >    cgroup_rstat_flush*() calls
> > 
> > - don't grab bio->bi_blkg reference when adding the stats into blkcg's
> >    per-cpu stat list since all stats are guaranteed to be consumed before
> >    releasing blkg instance, and grabbing blkg reference for stats was
> >    the most fragile part of original patch
> > 
> > Based on Waiman's patch:
> > 
> > https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
> > 
> > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> > Cc: stable@...r.kernel.org
> > Reported-by: Jay Shin <jaeshin@...hat.com>
> > Cc: Waiman Long <longman@...hat.com>
> > Cc: Tejun Heo <tj@...nel.org>
> > Cc: mkoutny@...e.com
> > Cc: Yosry Ahmed <yosryahmed@...gle.com>
> > Co-developed-by: Ming Lei <ming.lei@...hat.com>
> > Signed-off-by: Ming Lei <ming.lei@...hat.com>
> > Signed-off-by: Waiman Long <longman@...hat.com>
> > ---
> >   block/blk-cgroup.c     | 57 +++++++++++++++++++++++++++++++-----------
> >   include/linux/cgroup.h |  1 +
> >   kernel/cgroup/rstat.c  | 15 ++++++++++-
> >   3 files changed, 57 insertions(+), 16 deletions(-)
> 
> This is my counter-proposal to Ming's v3 patch. The major difference is that
> I used the existing cgroup_rstat_lock instead of adding a new internal lock.
> This minimizes performance impact to existing cgroup_rstat_flush*() call

The added internal lock has ~zero perf impact on rstat flush cause
the lock won't be contended basically.

> while achieving the same objective. I am fine with Ming current v3 patch if
> we decide to go that way.

As I mentioned, the main motivation with internal lock is to make the fix as
simple as possible since cross-subsystem change isn't involved, and I am fine
with any following cleanup or improvement on current blkg rstat flush.

Another benefit with this internal lock is that race in blkcg_reset_stats()
can be avoided.


Thanks, 
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ