linux-kernel - Re: [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 28 Jun 2011 21:29:55 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Shaohua Li <shaohua.li@...el.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jaxboe@...ionio.com" <jaxboe@...ionio.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	"khlebnikov@...nvz.org" <khlebnikov@...nvz.org>,
	"jmoyer@...hat.com" <jmoyer@...hat.com>
Subject: Re: [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups

On Wed, Jun 29, 2011 at 09:04:55AM +0800, Shaohua Li wrote:

[..]
> > We idle on last queue on sync-noidle tree. So we idle on fysnc queue as
> > it is last queue on sync-noidle tree. That's how we provide protection
> > to all sync-noidle queues against sync-idle queues. Instead of idling
> > on individual quues we do idling in group and that is on service tree.
> Ok. but this looks silly. We are idling in a noidle service tree or a
> group (backed by the last queue of the tree or group) because we assume
> the tree or group can dispatch a request soon. But if the think time of
> the tree or group is big, the assumption isn't true. Doing idle here is
> blind. I thought we can extend the think time check for both service
> tree and group.

We can implement the thinktime for noidle service tree and group idle as
well. That's not a problem, though I am yet to be convinced that thinktime
still makes sense for the group. I guess it will just mean that in the
past have you done a bunch of IO with gap between IO less than 8ms. If
yes, then we expect you to do more IO in future. Frankly speaking, I am
not too sure that how past IO pattern predicts the future IO pattern
of the group.

But anyway, the point is, even if you we implement it, it will not solve
the fsync issue at hand. The reason I explained in previous mail. We 
will be oscillating between high think time and low thinktime depending
on whether we are idling or not. There is no correlation between think
time of fsync thread and idling here.

I think you are banking on the fact that after fsync, journaling thread
IO can take more than 8ms hence delaying next IO to fsync thread, pushing
its thinktim more than 8ms hence we will not idle on fsync thread at
all. It is just one corner case and I think it is broken in multiple
cases.

- If filesystem barriers are disabled or backend storage has battery
  backup then journal IO most likely will go in cache and barriers
  will be ignored. In that case write will finish almost instantly
  and we will get next IO from fsync thread very soon hence pushing
  down thinktime of fsync thread which will enable idling and we will
  be back to the problem we are trying to solve.

- Fsync thread might be submitting string of IOs (say 10-12) before it
  moves to journal thread to commit meta data. In that case we might
  have lowered thinktime of fsync hence enable idle. 

So implementing think time for service tree/group might be a good idea
in general but it will not solve this IO dependecny issue across cgroups.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/