lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 May 2019 12:45:58 +0200
From:   Paolo Valente <>
To:     Jan Kara <>
Cc:     Theodore Ts'o <>,
        "Srivatsa S. Bhat" <>,,
        linux-block <>,,,,,,,,
Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup

> Il giorno 20 mag 2019, alle ore 11:15, Jan Kara <> ha scritto:
> On Sat 18-05-19 15:28:47, Theodore Ts'o wrote:
>> On Sat, May 18, 2019 at 08:39:54PM +0200, Paolo Valente wrote:
>>> I've addressed these issues in my last batch of improvements for
>>> BFQ, which landed in the upcoming 5.2. If you give it a try, and
>>> still see the problem, then I'll be glad to reproduce it, and
>>> hopefully fix it for you.
>> Hi Paolo, I'm curious if you could give a quick summary about what you
>> changed in BFQ?
>> I was considering adding support so that if userspace calls fsync(2)
>> or fdatasync(2), to attach the process's CSS to the transaction, and
>> then charge all of the journal metadata writes the process's CSS.  If
>> there are multiple fsync's batched into the transaction, the first
>> process which forced the early transaction commit would get charged
>> the entire journal write.  OTOH, journal writes are sequential I/O, so
>> the amount of disk time for writing the journal is going to be
>> relatively small, and especially, the fact that work from other
>> cgroups is going to be minimal, especially if hadn't issued an
>> fsync().
> But this makes priority-inversion problems with ext4 journal worse, doesn't
> it? If we submit journal commit in blkio cgroup of some random process, it
> may get throttled which then effectively blocks the whole filesystem. Or do
> you want to implement a more complex back-pressure mechanism where you'd
> just account to different blkio cgroup during journal commit and then
> throttle as different point where you are not blocking other tasks from
> progress?
>> In the case where you have three cgroups all issuing fsync(2) and they
>> all landed in the same jbd2 transaction thanks to commit batching, in
>> the ideal world we would split up the disk time usage equally across
>> those three cgroups.  But it's probably not worth doing that...
>> That being said, we probably do need some BFQ support, since in the
>> case where we have multiple processes doing buffered writes w/o fsync,
>> we do charnge the data=ordered writeback to each block cgroup. Worse,
>> the commit can't complete until the all of the data integrity
>> writebacks have completed.  And if there are N cgroups with dirty
>> inodes, and slice_idle set to 8ms, there is going to be 8*N ms worth
>> of idle time tacked onto the commit time.
> Yeah. At least in some cases, we know there won't be any more IO from a
> particular cgroup in the near future (e.g. transaction commit completing,
> or when the layers above IO scheduler already know which IO they are going
> to submit next) and in that case idling is just a waste of time.

Yep.  Issues like this are targeted exactly by the improvement I
mentioned in my previous reply.

> But so far
> I haven't decided how should look a reasonably clean interface for this
> that isn't specific to a particular IO scheduler implementation.

That's an interesting point.  So far, I've assumed that nobody would
have told anything to BFQ.  But if you guys think that such a
communication may be acceptable at some degree, then I'd be glad to
try to come up with some solution.  For instance: some hook that any
I/O scheduler may export if meaningful.


> 								Honza
> --
> Jan Kara <>
> SUSE Labs, CR

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists