[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200810031347.21586.nickpiggin@yahoo.com.au>
Date: Fri, 3 Oct 2008 13:47:21 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Mikulas Patocka <mpatocka@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...r.kernel.org,
agk@...hat.com, mbroz@...hat.com, chris@...chsys.com
Subject: Re: [PATCH] Memory management livelock
On Friday 03 October 2008 13:14, Andrew Morton wrote:
> On Fri, 3 Oct 2008 12:59:17 +1000 Nick Piggin <nickpiggin@...oo.com.au>
wrote:
> > On Friday 03 October 2008 12:40, Andrew Morton wrote:
> > > That can cause fsync to wait arbitrarily long if some other process is
> > > writing the file.
> >
> > It can be fixed without touching non-fsync paths (see my next email for
> > the way to fix it without touching fastpaths).
>
> yup.
>
> > > This happens.
> >
> > What does such a thing?
>
> I forget. People do all sorts of weird stuff.
Damn people...
I guess they also do non-weird stuff like expecting fsync to really fsync.
> > It would have been nicer to ask them to not do
> > that then, or get them to use range syncs or something. Now that's much
> > harder because we've accepted the crappy workaround for so long.
> >
> > It's far far worse to just ignore data integrity of fsync because of the
> > problem. Should at least have returned an error from fsync in that case,
> > or make it interruptible or something.
>
> If a file has one dirty page at offset 1000000000000000 then someone
> does an fsync() and someone else gets in first and starts madly writing
> pages at offset 0, we want to write that page at 1000000000000000.
> Somehow.
>
> I expect there's no solution which avoids blocking the writers at some
> stage.
See my other email. Something roughly like this would do the trick
(hey, it actually boots and runs and does fix the problem too).
It's ugly because we don't have quite the right radix tree operations
yet (eg. lookup multiple tags, set tag X if tag Y was set, proper range
lookups). But the theory is to up-front tag the pages that we need to
get to disk.
Completely no impact or slowdown to any writers (although it does add
8 bytes of tags to the radix tree node... but doesn't increase memory
footprint as such due to slab).
View attachment "mm-fsync-snapshot-fix.patch" of type "text/x-diff" (10300 bytes)
Powered by blists - more mailing lists