linux-kernel - Re: [RFC] writeback and cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120417220106.GF19975@google.com>
Date:	Tue, 17 Apr 2012 15:01:06 -0700
From:	Tejun Heo <tj@...nel.org>
To:	Jan Kara <jack@...e.cz>
Cc:	Vivek Goyal <vgoyal@...hat.com>,
	Fengguang Wu <fengguang.wu@...el.com>,
	Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
	sjayaraman@...e.com, andrea@...terlinux.com, jmoyer@...hat.com,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	kamezawa.hiroyu@...fujitsu.com, lizefan@...wei.com,
	containers@...ts.linux-foundation.org, cgroups@...r.kernel.org,
	ctalbott@...gle.com, rni@...gle.com, lsf@...ts.linux-foundation.org
Subject: Re: [RFC] writeback and cgroup

Hello,

On Wed, Apr 11, 2012 at 09:22:31PM +0200, Jan Kara wrote:
> > So all the metadata IO will happen thorough journaling thread and that
> > will be in root group which should remain unthrottled. So any journal
> > IO going to disk should remain unthrottled.
>
>   Yes, that is true at least for ext3/ext4 or btrfs. In principle we don't
> have to have the journal thread (as is the case of reiserfs where random
> writer may end up doing commit) but let's not complicate things
> unnecessarily.

Why can't journal entries keep track of the originator so that bios
can be attributed to the originator while committing?  That shouldn't
be too difficult to implement, no?

> > Now, IIRC, fsync problem with throttling was that we had opened a
> > transaction but could not write it back to disk because we had to
> > wait for all the cached data to go to disk (which is throttled). So
> > my question is, can't we first wait for all the data to be flushed
> > to disk and then open a transaction for metadata. metadata will be
> > unthrottled so filesystem will not have to do any tricks like bdi is
> > congested or not.
>
>   Actually that's what's happening. We first do filemap_write_and_wait()
> which syncs all the data and then we go and force transaction commit to
> make sure all metadata got to stable storage. The problem is that writeout
> of data may need to allocate new blocks and that starts a transaction and
> while the transaction is started we may need to do some reads (e.g. of
> bitmaps etc.) which may be throttled and at that moment the whole
> filesystem is blocked. I don't remember the stack traces you showed me so
> I'm not sure it this is what your observed but it's certainly one possible
> scenario. The reason why fsync triggers problems is simply that it's the
> only place where process normally does significant amount of writing. In
> most cases flusher thread / journal thread do it so this effect is not
> visible. And to precede your question, it would be rather hard to avoid IO
> while the transaction is started due to locking.

Probably we should mark all IOs issued inside transaction as META (or
whatever which tells blkcg to avoid throttling it).  We're gonna need
overcharging for metadata writes anyway, so I don't think this will
make too much of a difference.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/