[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1C0A2FC8-620C-4AFE-A921-35EDAC377BD4@linaro.org>
Date: Mon, 20 May 2019 12:45:58 +0200
From: Paolo Valente <paolo.valente@...aro.org>
To: Jan Kara <jack@...e.cz>
Cc: Theodore Ts'o <tytso@....edu>,
"Srivatsa S. Bhat" <srivatsa@...il.mit.edu>,
linux-fsdevel@...r.kernel.org,
linux-block <linux-block@...r.kernel.org>,
linux-ext4@...r.kernel.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, axboe@...nel.dk, jmoyer@...hat.com,
amakhalov@...are.com, anishs@...are.com, srivatsab@...are.com
Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup
controller
> Il giorno 20 mag 2019, alle ore 11:15, Jan Kara <jack@...e.cz> ha scritto:
>
> On Sat 18-05-19 15:28:47, Theodore Ts'o wrote:
>> On Sat, May 18, 2019 at 08:39:54PM +0200, Paolo Valente wrote:
>>> I've addressed these issues in my last batch of improvements for
>>> BFQ, which landed in the upcoming 5.2. If you give it a try, and
>>> still see the problem, then I'll be glad to reproduce it, and
>>> hopefully fix it for you.
>>
>> Hi Paolo, I'm curious if you could give a quick summary about what you
>> changed in BFQ?
>>
>> I was considering adding support so that if userspace calls fsync(2)
>> or fdatasync(2), to attach the process's CSS to the transaction, and
>> then charge all of the journal metadata writes the process's CSS. If
>> there are multiple fsync's batched into the transaction, the first
>> process which forced the early transaction commit would get charged
>> the entire journal write. OTOH, journal writes are sequential I/O, so
>> the amount of disk time for writing the journal is going to be
>> relatively small, and especially, the fact that work from other
>> cgroups is going to be minimal, especially if hadn't issued an
>> fsync().
>
> But this makes priority-inversion problems with ext4 journal worse, doesn't
> it? If we submit journal commit in blkio cgroup of some random process, it
> may get throttled which then effectively blocks the whole filesystem. Or do
> you want to implement a more complex back-pressure mechanism where you'd
> just account to different blkio cgroup during journal commit and then
> throttle as different point where you are not blocking other tasks from
> progress?
>
>> In the case where you have three cgroups all issuing fsync(2) and they
>> all landed in the same jbd2 transaction thanks to commit batching, in
>> the ideal world we would split up the disk time usage equally across
>> those three cgroups. But it's probably not worth doing that...
>>
>> That being said, we probably do need some BFQ support, since in the
>> case where we have multiple processes doing buffered writes w/o fsync,
>> we do charnge the data=ordered writeback to each block cgroup. Worse,
>> the commit can't complete until the all of the data integrity
>> writebacks have completed. And if there are N cgroups with dirty
>> inodes, and slice_idle set to 8ms, there is going to be 8*N ms worth
>> of idle time tacked onto the commit time.
>
> Yeah. At least in some cases, we know there won't be any more IO from a
> particular cgroup in the near future (e.g. transaction commit completing,
> or when the layers above IO scheduler already know which IO they are going
> to submit next) and in that case idling is just a waste of time.
Yep. Issues like this are targeted exactly by the improvement I
mentioned in my previous reply.
> But so far
> I haven't decided how should look a reasonably clean interface for this
> that isn't specific to a particular IO scheduler implementation.
>
That's an interesting point. So far, I've assumed that nobody would
have told anything to BFQ. But if you guys think that such a
communication may be acceptable at some degree, then I'd be glad to
try to come up with some solution. For instance: some hook that any
I/O scheduler may export if meaningful.
Thanks,
Paolo
> Honza
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists