lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090515130959.330dbd56.nishimura@mxp.nes.nec.co.jp>
Date:	Fri, 15 May 2009 13:09:59 +0900
From:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Balbir Singh <balbir@...ibm.com>,
	Hugh Dickins <hugh@...itas.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	linux-mm <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: [PATCH] memcg: fix stale swapcache at swapout

This patch is split out from Kamezawa-san's fix stale swap cache account leak
patch set(v7).

  http://lkml.org/lkml/2009/5/11/688

I moved the definition of memcg_free_unused_swapcache() back to memcontrol.c
(as I did in my original version) to avoid confilict with
Kosaki-san's remove CONFIG_UNEVICTABLE_LRU config option patch.
(it's not on mmotm yet though).
And IMHO, this function might not be appropriate for "vmscan".

This patch is based on mmotm-2009-05-13-16-34.
===
From: Daisuke Nishimura <nishimura@....nes.nec.co.jp>

Reclaiming anonymous memory in vmscan.c does following 2 steps.
  1. add to swap and unmap.
  2. pageout
But above 2 steps doesn't occur at once. There are many chances
to avoid pageout, and _really_ unused pages are swapped out by
visit-and-check-again logic of LRU rotation.

But this behavior has troubles with memcg because memcg doesn't put
!PageCgroupUsed swapcache back to its LRU.
These swapcache cannot be freed in memcg's LRU scanning, so swp_entry cannot
be freed properly as a result.

These swapcaches can be created by a race like below:

    Assume processA is exiting and pte points to a page(!PageSwapCache).
    And processB is trying reclaim the page.

              processA                   |           processB
    -------------------------------------+-------------------------------------
      (page_remove_rmap())               |  (shrink_page_list())
         mem_cgroup_uncharge_page()      |
            ->uncharged because it's not |
              PageSwapCache yet.         |
              So, both mem/memsw.usage   |
              are decremented.           |
                                         |    add_to_swap() -> added to swapcache.

    If this page goes thorough without being freed for some reason, this page
    doesn't goes back to memcg's LRU because of !PageCgroupUsed.

This patch adds a hook after add_to_swap() to check the page is mapped by a
process or not, and frees it if it has been unmapped already.

If a page has been on swap cache already when the owner process calls
page_remove_rmap() -> mem_cgroup_uncharge_page(), the page is not uncharged.
It goes back to memcg's LRU even if it goes through shrink_page_list()
without being freed, so this patch ignores these case.

Signed-off-by: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
---
 include/linux/swap.h |   12 ++++++++++++
 mm/memcontrol.c      |   14 +++++++++++++-
 mm/vmscan.c          |    8 ++++++++
 3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6ea541d..00d8066 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -336,11 +336,17 @@ static inline void disable_swap_token(void)
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent);
+extern int memcg_free_unused_swapcache(struct page *page);
 #else
 static inline void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
 {
 }
+static inline int
+memcg_free_unused_swapcache(struct page *page)
+{
+	return 0;
+}
 #endif
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern void mem_cgroup_uncharge_swap(swp_entry_t ent);
@@ -436,6 +442,12 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
 {
 }
 
+static inline int
+memcg_free_unused_swapcache(struct page *page)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SWAP */
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0c90345..63844cd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1578,7 +1578,19 @@ void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
 	if (memcg)
 		css_put(&memcg->css);
 }
-#endif
+
+int memcg_free_unused_swapcache(struct page *page)
+{
+	VM_BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!PageSwapCache(page));
+
+	if (mem_cgroup_disabled())
+		return 0;
+	if (!PageAnon(page) || page_mapped(page))
+		return 0;
+	return try_to_free_swap(page);	/* checks page_swapcount */
+}
+#endif /* CONFIG_SWAP */
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2f9d555..1fd0e43 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -665,6 +665,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page))
 				goto activate_locked;
+			/*
+			 * The owner process might have uncharged the page
+			 * (by page_remove_rmap()) before it has been added
+			 * to swap cache.
+			 * Check it here to avoid making it stale.
+			 */
+			if (memcg_free_unused_swapcache(page))
+				goto keep_locked;
 			may_enter_fs = 1;
 		}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ