[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190211213417.uhfiz5iqwkfrvk25@macbook-pro-91.dhcp.thefacebook.com>
Date: Mon, 11 Feb 2019 16:34:19 -0500
From: Josef Bacik <josef@...icpanda.com>
To: Andrea Righi <righi.andrea@...il.com>
Cc: Josef Bacik <josef@...icpanda.com>,
Paolo Valente <paolo.valente@...aro.org>,
Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Jens Axboe <axboe@...nel.dk>, Vivek Goyal <vgoyal@...hat.com>,
Dennis Zhou <dennis@...nel.org>, cgroups@...r.kernel.org,
linux-block@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during
sync()
On Mon, Feb 11, 2019 at 09:40:29PM +0100, Andrea Righi wrote:
> On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote:
> > On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote:
> > > This is an attempt to mitigate the priority inversion problem of a
> > > high-priority blkcg issuing a sync() and being forced to wait the
> > > completion of all the writeback I/O generated by any other low-priority
> > > blkcg, causing massive latencies to processes that shouldn't be
> > > I/O-throttled at all.
> > >
> > > The idea is to save a list of blkcg's that are waiting for writeback:
> > > every time a sync() is executed the current blkcg is added to the list.
> > >
> > > Then, when I/O is throttled, if there's a blkcg waiting for writeback
> > > different than the current blkcg, no throttling is applied (we can
> > > probably refine this logic later, i.e., a better policy could be to
> > > adjust the throttling I/O rate using the blkcg with the highest speed
> > > from the list of waiters - priority inheritance, kinda).
> > >
> > > This topic has been discussed here:
> > > https://lwn.net/ml/cgroups/20190118103127.325-1-righi.andrea@gmail.com/
> > >
> > > But we didn't come up with any definitive solution.
> > >
> > > This patch is not a definitive solution either, but it's an attempt to
> > > continue addressing this issue and handling the priority inversion
> > > problem with sync() in a better way.
> > >
> > > Signed-off-by: Andrea Righi <righi.andrea@...il.com>
> >
> > Talked with Tejun about this some and we agreed the following is probably the
> > best way forward
>
> First of all thanks for the update!
>
> >
> > 1) Track the submitter of the wb work to the writeback code.
>
> Are we going to track the cgroup that originated the dirty pages (or
> maybe dirty inodes) or do you have any idea in particular?
>
The guy doing the sync(), so that way we can accomplish #3. But really this is
an implementation detail, however you want to accomplish it is fine by me.
> > 2) Sync() defaults to the root cg, and and it writes all the things as the root
> > cg.
>
> OK.
>
> > 3) Add a flag to the cgroups that would make sync()'ers in that group only be
> > allowed to write out things that belong to its group.
>
> So, IIUC, when this flag is enabled a cgroup that is doing sync() would
> trigger the writeback of the pages that belong to that cgroup only and
> it waits only for these pages to be sync-ed, right? In this case
> writeback can still go at cgroup's speed.
>
> Instead when the flag is disabled, sync() would trigger writeback I/O
> globally, as usual, and it goes at full speed (root cgroup's speed).
>
> Am I understanding correctly?
>
Yup that's exactly it. Thanks,
Josef
Powered by blists - more mailing lists