lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 12 May 2009 17:13:56 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	mingo@...e.hu,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: [PATCH][BUGFIX] memcg: fix for deadlock between lock_page_cgroup
 and mapping tree_lock

On Tue, 12 May 2009 17:00:07 +0900
Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:
> hmm, I see.
> cache_charge is outside of tree_lock, so moving uncharge would make sense.
> IMHO, we should make the period of spinlock as small as possible,
> and charge/uncharge of pagecache/swapcache is protected by page lock, not tree_lock.
> 
How about this ?
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>

As Nishimura pointed out, mapping->tree_lock can be aquired from interrupt
context. Then, following dead lock can occur.
Assume "A" as a page.

 CPU0:
       lock_page_cgroup(A)
		interrupted
			-> take mapping->tree_lock.
 CPU1:
       take mapping->tree_lock
		-> lock_page_cgroup(A)

This patch tries to fix above deadlock by moving memcg's hook to out of
mapping->tree_lock.

After this patch, lock_page_cgroup() is not called under mapping->tree_lock.

Making Nishimura's first fix more fundamanetal for avoiding to add special case.

Reported-by: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>

---
 mm/filemap.c    |    6 +++---
 mm/swap_state.c |    2 +-
 mm/truncate.c   |    1 +
 mm/vmscan.c     |    2 ++
 4 files changed, 7 insertions(+), 4 deletions(-)

Index: mmotm-2.6.30-May07/mm/filemap.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/filemap.c
+++ mmotm-2.6.30-May07/mm/filemap.c
@@ -121,7 +121,6 @@ void __remove_from_page_cache(struct pag
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	BUG_ON(page_mapped(page));
-	mem_cgroup_uncharge_cache_page(page);
 
 	/*
 	 * Some filesystems seem to re-dirty the page even after
@@ -145,6 +144,7 @@ void remove_from_page_cache(struct page 
 	spin_lock_irq(&mapping->tree_lock);
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	mem_cgroup_uncharge_cache_page(page);
 }
 
 static int sync_page(void *word)
@@ -476,13 +476,13 @@ int add_to_page_cache_locked(struct page
 		if (likely(!error)) {
 			mapping->nrpages++;
 			__inc_zone_page_state(page, NR_FILE_PAGES);
+			spin_unlock_irq(&mapping->tree_lock);
 		} else {
 			page->mapping = NULL;
+			spin_unlock_irq(&mapping->tree_lock);
 			mem_cgroup_uncharge_cache_page(page);
 			page_cache_release(page);
 		}
-
-		spin_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();
 	} else
 		mem_cgroup_uncharge_cache_page(page);
Index: mmotm-2.6.30-May07/mm/swap_state.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/swap_state.c
+++ mmotm-2.6.30-May07/mm/swap_state.c
@@ -121,7 +121,6 @@ void __delete_from_swap_cache(struct pag
 	total_swapcache_pages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	INC_CACHE_INFO(del_total);
-	mem_cgroup_uncharge_swapcache(page, ent);
 }
 
 /**
@@ -191,6 +190,7 @@ void delete_from_swap_cache(struct page 
 	__delete_from_swap_cache(page);
 	spin_unlock_irq(&swapper_space.tree_lock);
 
+	mem_cgroup_uncharge_swapcache(page, ent);
 	swap_free(entry);
 	page_cache_release(page);
 }
Index: mmotm-2.6.30-May07/mm/truncate.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/truncate.c
+++ mmotm-2.6.30-May07/mm/truncate.c
@@ -359,6 +359,7 @@ invalidate_complete_page2(struct address
 	BUG_ON(page_has_private(page));
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	mem_cgroup_uncharge_cache_page(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;
 failed:
Index: mmotm-2.6.30-May07/mm/vmscan.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/vmscan.c
+++ mmotm-2.6.30-May07/mm/vmscan.c
@@ -477,10 +477,12 @@ static int __remove_mapping(struct addre
 		swp_entry_t swap = { .val = page_private(page) };
 		__delete_from_swap_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
+		mem_cgroup_uncharge_swapcache(page);
 		swap_free(swap);
 	} else {
 		__remove_from_page_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
+		mem_cgroup_uncharge_cache_page(page);
 	}
 
 	return 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ