[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110419144542.GA9556@quack.suse.cz>
Date: Tue, 19 Apr 2011 16:45:42 +0200
From: Jan Kara <jack@...e.cz>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
Greg Thelen <gthelen@...gle.com>,
James Bottomley <James.Bottomley@...senpartnership.com>,
lsf@...ts.linux-foundation.org, linux-fsdevel@...r.kernel.org,
linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: cgroup IO throttling and filesystem ordered mode (Was: Re:
[Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary
Agenda and Activities for LSF))
On Tue 19-04-11 10:30:22, Vivek Goyal wrote:
> On Tue, Apr 19, 2011 at 10:33:39AM +1000, Dave Chinner wrote:
> > If you want to throttle journal operations, then we probably need to
> > throttle metadata operations that commit to the journal, not the
> > journal IO itself. The journal is a shared global resource that all
> > cgroups use, so throttling journal IO inappropriately will affect
> > the performance of all cgroups, not just the one that is "hogging"
> > it.
>
> Agreed.
>
> >
> > In XFS, you could probably do this at the transaction reservation
> > stage where log space is reserved. We know everything about the
> > transaction at this point in time, and we throttle here already when
> > the journal is full. Adding cgroup transaction limits to this point
> > would be the place to do it, but the control parameter for it would
> > be very XFS specific (i.e. number of transactions/s). Concurrency is
> > not an issue - the XFS transaction subsystem is only limited in
> > concurrency by the space available in the journal for reservations
> > (hundred to thousands of concurrent transactions).
>
> Instead of transaction per second, can we implement some kind of upper
> limit of pending transactions per cgroup. And that limit does not have
> to be user tunable to begin with. The effective transactions/sec rate
> will automatically be determined by IO throttling rate of the cgroup
> at the end nodes.
>
> I think effectively what we need is that the notion of parallel
> transactions so that transactions of one cgroup can make progress
> independent of transactions of other cgroup. So if a process does
> an fsync and it is throttled then it should block transaction of
> only that cgroup and not other cgroups.
>
> You mentioned that concurrency is not an issue in XFS and hundreds of
> thousands of concurrent trasactions can progress depending on log space
> available. If that's the case, I think to begin with we might not have
> to do anything at all. Processes can still get blocked but as long as
> we have enough log space, this might not be a frequent event. I will
> do some testing with XFS and see can I livelock the system with very
> low IO limits.
>
> >
> > FWIW, this would even allow per-bdi-flusher thread transaction
> > throttling parameters to be set, so writeback triggered metadata IO
> > could possibly be limited as well.
>
> How does writeback trigger metadata IO?
Because by writing data, you may need to do block allocation or mark
blocks as written on disk, or similar changes to metadata...
> In the first step I was looking to not throttle meta data IO as that
> will require even more changes in file system layer. I was thinking
> that if we provide throttling only for data and do changes in filesystems
> so that concurrent transactions can exist and make progress and file
> system IO does not serialize behind slow throttled cgroup.
Yes, I think not throttling metadata is a good start.
> This leads to weaker isolation but atleast we don't run into livelocking
> or filesystem scalability issues. Once that's resolved, we can handle the
> case of throttling meta data IO also.
>
> In fact if metadata is dependent on data (in ordered mode) and if we are
> throttling data, then we automatically throttle meata for select cases.
>
> >
> > I'm not sure whether this is possible with other filesystems, and
> > ext3/4 would still have the issue of ordered writeback causing much
> > more writeback than expected at times (e.g. fsync), but I suspect
> > there is nothing that can really be done about this.
>
> Can't this be modified so that multiple per cgroup transactions can make
> progress. So if one fsync is blocked, then processes in other cgroup
> should still be able to do IO using a separate transaction and be able
> to commit it.
Not really. Ext3/4 has always a single running transaction and all
metadata updates from all threads are recorded in it. When the transaction
grows large/old enough, we commit it and start a new transaction. The fact
that there is always just one running transaction is heavily used in the
journaling code so it would need serious rewrite of JBD2...
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists