[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0810052320480.3074@hs20-bc2-1.build.redhat.com>
Date: Sun, 5 Oct 2008 23:30:51 -0400 (EDT)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Arjan van de Ven <arjan@...radead.org>
cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, agk@...hat.com, mbroz@...hat.com,
chris@...chsys.com
Subject: Re: [PATCH 2/3] Fix fsync livelock
On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> On Sun, 5 Oct 2008 20:01:46 -0400 (EDT)
> Mikulas Patocka <mpatocka@...hat.com> wrote:
>
> > I assume that if very few people complained about the livelock till
> > now, very few people will see degraded write performance. My patch
> > blocks the writes only if the livelock happens, so if the livelock
> > doesn't happen in unpatched kernel for most people, the patch won't
> > make it worse.
>
> I object to calling this a livelock. It's not.
It unlocks itself when the whole disk is written, and it can be several
hours (or days, if you have many-terabyte array). So formally it is not
livelock, from the user experience it is --- he sees unkillable process in
'D' state for many hours.
> And yes, fsync is slow and lots of people are seeing that.
> It's not helped by how ext3 is implemented (where fsync is effectively
> equivalent of a sync for many cases).
> But again, moving the latency to "innocent" parties is not acceptable.
>
> >
> > > If the fsync() implementation isn't smart enough, sure, lets improve
> > > it. But not by shifting latency around... lets make it more
> > > efficient at submitting IO.
> > > If we need to invent something like "chained IO" where if you wait
> > > on the last of the chain, you wait on the entirely chain, so be it.
> >
> > This looks madly complicated. And ineffective, because if some page
> > was submitted before fsync() was invoked, and is under writeback
> > while fsync() is called, fsync() still has to wait on it.
>
> so?
> just make a chain per inode always...
The point is that many fsync()s may run in parallel and you have just one
inode and just one chain. And if you add two-word list_head to a page, to
link it on this list, many developers will hate it for increasing its
size.
See the work dobe by Nick Piggin somewhere in this thread. He uses just
one bit in radix tree to mark pages to process. But he needs to serialize
all syncs on a given file, they no longer run in parallel.
Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists