[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0810052002520.5798@hs20-bc2-1.build.redhat.com>
Date: Sun, 5 Oct 2008 20:04:43 -0400 (EDT)
From: Mikulas Patocka <mpatocka@...hat.com>
To: david@...g.hm
cc: Nick Piggin <nickpiggin@...oo.com.au>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, agk@...hat.com, mbroz@...hat.com,
chris@...chsys.com
Subject: Re: application syncing options (was Re: [PATCH] Memory management
livelock)
On Fri, 3 Oct 2008, david@...g.hm wrote:
> On Fri, 3 Oct 2008, Nick Piggin wrote:
>
> > > *What* is, forever? Data integrity syncs should have pages operated on
> > > in-order, until we get to the end of the range. Circular writeback could
> > > go through again, possibly, but no more than once.
> >
> > OK, I have been able to reproduce it somewhat. It is not a livelock,
> > but what is happening is that direct IO read basically does an fsync
> > on the file before performing the IO. The fsync gets stuck behind the
> > dd that is dirtying the pages, and ends up following behind it and
> > doing all its IO for it.
> >
> > The following patch avoids the issue for direct IO, by using the range
> > syncs rather than trying to sync the whole file.
> >
> > The underlying problem I guess is unchanged. Is it really a problem,
> > though? The way I'd love to solve it is actually by adding another bit
> > or two to the pagecache radix tree, that can be used to transiently tag
> > the tree for future operations. That way we could record the dirty and
> > writeback pages up front, and then only bother with operating on them.
> >
> > That's *if* it really is a problem. I don't have much pity for someone
> > doing buffered IO and direct IO to the same pages of the same file :)
>
> I've seen lots of discussions here about different options in syncing. in this
> case a fix is to do a fsync of a range.
It fixes the bug in concurrent direct read+buffed write, but won't fix the
bug with concurrent sync+buffered write.
> I've also seen discussions of how the
> kernel filesystem code can do ordered writes without having to wait for them
> with the use of barriers, is this capability exported to userspace? if so,
> could you point me at documentation for it?
It isn't. And it is good that it isn't --- the more complicated API, the
more maintenance work.
Mikulas
> David Lang
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists