lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Feb 2019 21:40:29 +0100
From:   Andrea Righi <righi.andrea@...il.com>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Paolo Valente <paolo.valente@...aro.org>,
        Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jens Axboe <axboe@...nel.dk>, Vivek Goyal <vgoyal@...hat.com>,
        Dennis Zhou <dennis@...nel.org>, cgroups@...r.kernel.org,
        linux-block@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during
 sync()

On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote:
> On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote:
> > This is an attempt to mitigate the priority inversion problem of a
> > high-priority blkcg issuing a sync() and being forced to wait the
> > completion of all the writeback I/O generated by any other low-priority
> > blkcg, causing massive latencies to processes that shouldn't be
> > I/O-throttled at all.
> > 
> > The idea is to save a list of blkcg's that are waiting for writeback:
> > every time a sync() is executed the current blkcg is added to the list.
> > 
> > Then, when I/O is throttled, if there's a blkcg waiting for writeback
> > different than the current blkcg, no throttling is applied (we can
> > probably refine this logic later, i.e., a better policy could be to
> > adjust the throttling I/O rate using the blkcg with the highest speed
> > from the list of waiters - priority inheritance, kinda).
> > 
> > This topic has been discussed here:
> > https://lwn.net/ml/cgroups/20190118103127.325-1-righi.andrea@gmail.com/
> > 
> > But we didn't come up with any definitive solution.
> > 
> > This patch is not a definitive solution either, but it's an attempt to
> > continue addressing this issue and handling the priority inversion
> > problem with sync() in a better way.
> > 
> > Signed-off-by: Andrea Righi <righi.andrea@...il.com>
> 
> Talked with Tejun about this some and we agreed the following is probably the
> best way forward

First of all thanks for the update!

> 
> 1) Track the submitter of the wb work to the writeback code.

Are we going to track the cgroup that originated the dirty pages (or
maybe dirty inodes) or do you have any idea in particular?

> 2) Sync() defaults to the root cg, and and it writes all the things as the root
>    cg.

OK.

> 3) Add a flag to the cgroups that would make sync()'ers in that group only be
>    allowed to write out things that belong to its group.

So, IIUC, when this flag is enabled a cgroup that is doing sync() would
trigger the writeback of the pages that belong to that cgroup only and
it waits only for these pages to be sync-ed, right? In this case
writeback can still go at cgroup's speed.

Instead when the flag is disabled, sync() would trigger writeback I/O
globally, as usual, and it goes at full speed (root cgroup's speed).

Am I understanding correctly?

> 
> This way we avoid the priority inversion of having things like systemd or random
> logged in user doing sync() and having to wait, and we keep low prio cgroups
> from causing big IO storms by syncing out stuff and getting upgraded to root
> priority just to avoid the inversion.
> 
> Obviously by default we want this flag to be off since its such a big change,
> but people/setups really worried about this behavior (Facebook for instance
> would likely use this flag) can go ahead and set it and be sure we're getting
> good isolation and still avoiding the priority inversion associated with running
> sync from a high priority context.  Thanks,
> 
> Josef

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ