[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200429142230.GE5462@mtj.thefacebook.com>
Date: Wed, 29 Apr 2020 10:22:30 -0400
From: Tejun Heo <tj@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: Dave Chinner <david@...morbit.com>,
Dan Schatzberg <schatzberg.dan@...il.com>,
Jens Axboe <axboe@...nel.dk>,
Alexander Viro <viro@...iv.linux.org.uk>,
Amir Goldstein <amir73il@...il.com>,
Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, Roman Gushchin <guro@...com>,
Shakeel Butt <shakeelb@...gle.com>,
Chris Down <chris@...isdown.name>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Ingo Molnar <mingo@...nel.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
"open list:BLOCK LAYER" <linux-block@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"open list:FILESYSTEMS (VFS and infrastructure)"
<linux-fsdevel@...r.kernel.org>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
<linux-mm@...ck.org>
Subject: Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup
Hello,
On Wed, Apr 29, 2020 at 12:25:40PM +0200, Jan Kara wrote:
> Yeah, I was thinking about the same when reading the patch series
> description. We already have some cgroup workarounds for btrfs kthreads if
> I remember correctly, we have cgroup handling for flush workers, now we are
> adding cgroup handling for loopback device workers, and soon I'd expect
> someone comes with a need for DM/MD worker processes and IMHO it's getting
> out of hands because the complexity spreads through the kernel with every
> subsystem comming with slightly different solution to the problem and also
> the number of kthreads gets multiplied by the number of cgroups. So I
> agree some generic solution how to approach IO throttling of kthreads /
> workers would be desirable.
>
> OTOH I don't have a great idea how the generic infrastructure should look
> like...
I don't really see a way around that. The only generic solution would be
letting all IOs through as root and handle everything through backcharging,
which we already can do as backcharging is already in use to handle metadata
updates which can't be controlled directly. However, doing that for all IOs
would make the control quality a lot worse as all control would be based on
first incurring deficit and then try to punish the issuer after the fact.
The infrastructure work done to make IO control work for btrfs is generic
and the changes needed on btrfs side was pretty small. Most of the work was
identifying non-regular IO pathways (bouncing through different kthreads and
whatnot) and making sure they're annotating IO ownership and the needed
mechanism correctly. The biggest challenge probably is ensuring that the
filesystem doesn't add ordering dependency between separate data IOs, which
is a nice property to have with or without cgroup support.
That leaves the nesting drivers, loop and md/dm. Given that they sit in the
middle of IO stack and proxy a lot of its roles, they'll have to be updated
to be transparent in terms of cgroup ownership if IO control is gonna work
through them. Maybe we can have a common infra shared between loop, dm and
md but they aren't many and may also be sufficiently different. idk
Thanks.
--
tejun
Powered by blists - more mailing lists