[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061117005052.GK11034@melbourne.sgi.com>
Date: Fri, 17 Nov 2006 11:50:52 +1100
From: David Chinner <dgc@....com>
To: "Rafael J. Wysocki" <rjw@...k.pl>
Cc: Andrew Morton <akpm@...l.org>, LKML <linux-kernel@...r.kernel.org>,
Pavel Machek <pavel@....cz>,
Nigel Cunningham <nigel@...pend2.net>,
David Chinner <dgc@....com>
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions
On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> Hi,
>
> The following two patches introduce a mechanism that should allow us to
> avoid suspend-related corruptions of XFS without the freezing of bdevs which
> Pavel considers as too invasive (apart from this, the freezing of bdevs may
> lead to some undesirable interactions with dm and for now it seems to be
> supported for real by XFS only).
Has this been tested and proven to fix the problem with XFS? It's
been asserted that this will fix XFS and suspend, but it's
not yet been proven that this is even the problem.
I think the problem is a race between sys_sync, the kernel thread
freeze and the xfsbufd flushing async, delayed write metadata
buffers resulting in a inconsistent suspend image being created.
If this is the case, then freezing the workqueues does not
fix the problem. i.e:
suspend xfs
------- ---
sys_sync completes
xfsbufd flushes delwri metadata
kernel thread freeze
workqueue freeze
suspend image start
async I/O starts to complete
suspend image finishes
async I/O all complete
The problem here is the memory image has an empty delayed write
metadata buffer queue, but the I/O completion queue will be missing
some (or all) of the I/O that was issued, and so on resume we have
a memory image that still thinks the I/Os are progress but they
are not queued anywhere for completion processing.
Hence after a successful resume after the above occurred on suspend,
we can have a filesystem that is potentially inconsistent, and it
will almost certainly hang soon after activity starts again on it
because we cannot push the tail of the log forwards due to the lost
buffers.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists