lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130424185744.GB2144@suse.de>
Date:	Wed, 24 Apr 2013 19:57:44 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jerome Marchand <jmarchan@...hat.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Hugh Dickins <hughd@...gle.com>
Subject: [PATCH] mm: swap: Mark swap pages writeback before queueing for
 direct IO

As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting
PageWriteback before it is queued for direct IO. While swap pages do not
participate in BDI or process dirty accounting and the IO is synchronous,
the writeback bit is still required and not setting it in this case was
an oversight.  swapoff depends on the page writeback to synchronoise all
pending writes on a swap page before it is reused. Swapcache freeing and
reuse depend on checking the PageWriteback under lock to ensure the page
is safe to reuse.

Direct IO handlers and the direct IO handler for NFS do not deal with
PageWriteback as they are synchronous writes. In the case of NFS, it
schedules pages (or a page in the case of swap) for IO and then waits
synchronously for IO to complete in nfs_direct_write(). It is recognised
that this is a slowdown from normal swap handling which is asynchronous
and uses a completion handler. Shoving PageWriteback handling down into
direct IO handlers looks like a bad fit to handle the swap case although
it may have to be dealt with some day if swap is converted to use direct
IO in general and bmap is finally done away with. At that point it will
be necessary to refit asynchronous direct IO with completion handlers onto
the swap subsystem.

As swapcache currently depends on PageWriteback to protect against races,
this patch sets PageWriteback under the page lock before queueing it for
direct IO. It is cleared when the direct IO handler returns. IO errors
are treated similarly to the direct-to-bio case except PageError is not
set as in the case of swap-over-NFS, it is likely to be a transient error.

It was asked what prevents such a page being reclaimed in parallel.
With this patch applied, such a page will now be skipped (most of the time)
or blocked until the writeback completes.  Reclaim checks PageWriteback
under the page lock before calling try_to_free_swap and the page lock
should prevent the page being requeued for IO before it is freed.

This and Jerome's related patch should considered for -stable as far
back as 3.6 when swap-over-NFS was introduced.

Signed-off-by: Mel Gorman <mgorman@...e.de>
---
 mm/page_io.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index 04ca00d..ec04247 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -214,6 +214,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		kiocb.ki_left = PAGE_SIZE;
 		kiocb.ki_nbytes = PAGE_SIZE;
 
+		set_page_writeback(page);
 		unlock_page(page);
 		ret = mapping->a_ops->direct_IO(KERNEL_WRITE,
 						&kiocb, &iov,
@@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 			count_vm_event(PSWPOUT);
 			ret = 0;
 		} else {
+			/*
+			 * In the case of swap-over-nfs, this can be a
+			 * temporary failure if the system has limited
+			 * memory for allocating transmit buffers.
+			 * Mark the page dirty and avoid
+			 * rotate_reclaimable_page but rate-limit the
+			 * messages but do not flag PageError like
+			 * the normal direct-to-bio case as it could
+			 * be temporary.
+			 */
 			set_page_dirty(page);
+			ClearPageReclaim(page);
+			if (printk_ratelimit()) {
+				pr_err("Write-error on dio swapfile (%Lu)\n",
+					(unsigned long long)page_file_offset(page));
+			}
 		}
+		end_page_writeback(page);
 		return ret;
 	}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ