linux-kernel - Re: [PATCH 6/6] vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20100802133259.4F89.A69D9226@jp.fujitsu.com>
Date:	Mon,  2 Aug 2010 16:57:18 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Trond Myklebust <trond.myklebust@....uio.no>,
	Chris Mason <chris.mason@...cle.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	Dave Chinner <david@...morbit.com>,
	Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Christoph Hellwig <hch@...radead.org>,
	Wu Fengguang <fengguang.wu@...el.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH 6/6] vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages

Hi

> The problem that I am seeing is that the try_to_release_page() needs to
> be told to act as a non-blocking call when the process is kswapd, just
> like the pageout() call.
> 
> Currently, the sc->gfp_mask is set to GFP_KERNEL, which normally means
> that the call may wait on I/O to complete. However, what I'm seeing in
> the bugzilla above is that if kswapd waits on an RPC call, then the
> whole VM may gum up: typically, the traces show that the socket layer
> cannot allocate memory to hold the RPC reply from the server, and so it
> is kicking kswapd to have it reclaim some pages, however kswapd is stuck
> in try_to_release_page() waiting for that same I/O to complete, hence
> the deadlock...

Ah, I see. so as far as I understand, you mean
 - Socket layer use GFP_ATOMIC, then they don't call try_to_free_pages().
   IOW, kswapd is only memory reclaiming thread.
 - Kswapd got stuck in ->release_page().
 - In usual use case, another thread call kmalloc(GFP_KERNEL) and makes
   foreground reclaim, then, restore kswapd stucking. but your case
   there is no such thread.

Hm, interesting.

In short term, current nfs fix (checking PF_MEMALLOC in nfs_wb_page())
seems best way. it's no side effect if my understanding is correct.


> IOW: I think kswapd at least should be calling try_to_release_page()
> with a gfp-flag of '0' to avoid deadlocking on I/O.

Hmmm.
0 seems to have very strong meanings rather than nfs required. 
There is no reason to prevent grabbing mutex, calling cond_resched() etc etc...

[digging old git history]

Ho hum...

Old commit log says passing gfp-flag=0 break xfs. but current xfs doesn't
use gfp_mask argument. hm.


============================================================
commit 68678e2fc6cfdfd013a2513fe416726f3c05b28d
Author: akpm <akpm>
Date:   Tue Sep 10 18:09:08 2002 +0000

    [PATCH] pass the correct flags to aops->releasepage()

    Restore the gfp_mask in the VM's call to a_ops->releasepage().  We can
    block in there again, and XFS (at least) can use that.

    BKrev: 3d7e35445skDsKDFM6rdiwTY-5elsw

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed1ec3..89d801e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -208,7 +208,7 @@ shrink_list(struct list_head *page_list, int nr_pages,
                 * Otherwise, leave the page on the LRU so it is swappable.
                 */
                if (PagePrivate(page)) {
-                       if (!try_to_release_page(page, 0))
+                       if (!try_to_release_page(page, gfp_mask))
                                goto keep_locked;
                        if (!mapping && page_count(page) == 1)
                                goto free_it;
============================================================

Now, gfp_mask of try_to_release_page() are used in two place.

btrfs: btrfs_releasepage		(check GFP_WAIT)
nfs: nfs_release_page			((gfp & GFP_KERNEL) == GFP_KERNEL)

Probably, btrfs can remove such GFP_WAIT check from try_release_extent_mapping
because it doesn't sleep. I dunno. if so, we can change it to 0 again. but
I'm not sure it has enough worth thing.

Chris, can we hear how btrfs handle gfp_mask argument of release_page()?



btw, VM fokls need more consider kswapd design. now kswapd oftern sleep.
But Trond's bug report says, waiting itself can makes deadlock potentially.
Perhaps it's merely imagine thing. but need to some consider...




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/