linux-kernel - Re: [PATCH 3/3] writeback, blkio: add documentation for cgroup writeback support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150615233519.GB30059@thunk.org>
Date:	Mon, 15 Jun 2015 19:35:19 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Tejun Heo <tj@...nel.org>
Cc:	Vivek Goyal <vgoyal@...hat.com>, axboe@...nel.dk,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	lizefan@...wei.com, cgroups@...r.kernel.org
Subject: Re: [PATCH 3/3] writeback, blkio: add documentation for cgroup
 writeback support

On Mon, Jun 15, 2015 at 02:23:45PM -0400, Tejun Heo wrote:
> 
> On ext2, there's nothing interlocking each other.  My understanding of
> ext4 is pretty limited but as long as the journal head doesn't
> overwrap and gets bloked on the slow one, it should be fine, so for
> most use cases, this shouldn't be a problem.

The writes to the journal in ext3/ext4 are done from the jbd/jbd2
kernel thread.  So writes to the journal shouldn't be a problem.  In
data=ordered mode inodes that have blocks that were allocated during
the current transaction do have to have their data blocks written out,
and this is done by the jbd/jbd2 thread using filemap_fdatawait().

If this gets throttled because blocks were originally dirtied by some
cgroup that didn't have much disk time quota, then all file system
activities will get stalled out until the ordered mode writeback
completes, which means if there are any high priority cgroups trying
to execute any system call that mutates file system state will block
until the commit has gotten past the initial setup stage, and so other
system activity could sputter to a halt --- at which point the commit
will be allowed to compete, and then all of the calls to
ext4_journal_start() will unblock, and the system will come back to
life.  :-)

Because ext3 doesn't have delayed allocation, it will orders of
magnitude more data=ordered block flushing, so this problem will be
far worse with ext3 compared to ext4.

So if there is some way we can signal to any cgroup that that might be
throttling writeback or disk I/O that the jbd/jbd2 process should be
considered privileged, that would be a good since it would allow us to
avoid a potential priority inversion problem. 

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/