lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 17 Nov 2006 11:50:52 +1100
From:	David Chinner <dgc@....com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Andrew Morton <akpm@...l.org>, LKML <linux-kernel@...r.kernel.org>,
	Pavel Machek <pavel@....cz>,
	Nigel Cunningham <nigel@...pend2.net>,
	David Chinner <dgc@....com>
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> The following two patches introduce a mechanism that should allow us to
> avoid suspend-related corruptions of XFS without the freezing of bdevs which
> Pavel considers as too invasive (apart from this, the freezing of bdevs may
> lead to some undesirable interactions with dm and for now it seems to be
> supported for real by XFS only).

Has this been tested and proven to fix the problem with XFS? It's
been asserted that this will fix XFS and suspend, but it's
not yet been proven that this is even the problem.

I think the problem is a race between sys_sync, the kernel thread
freeze and the xfsbufd flushing async, delayed write metadata
buffers resulting in a inconsistent suspend image being created.
If this is the case, then freezing the workqueues does not
fix the problem. i.e:

suspend				xfs
-------				---
sys_sync completes
				xfsbufd flushes delwri metadata
kernel thread freeze
workqueue freeze
suspend image start
				async I/O starts to complete
suspend image finishes
				async I/O all complete

The problem here is the memory image has an empty delayed write
metadata buffer queue, but the I/O completion queue will be missing
some (or all) of the I/O that was issued, and so on resume we have
a memory image that still thinks the I/Os are progress but they
are not queued anywhere for completion processing.

Hence after a successful resume after the above occurred on suspend,
we can have a filesystem that is potentially inconsistent, and it
will almost certainly hang soon after activity starts again on it
because we cannot push the tail of the log forwards due to the lost
buffers.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists