[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080927124745.2e216381.kamezawa.hiroyu@jp.fujitsu.com>
Date: Sat, 27 Sep 2008 12:47:45 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
"xemul@...nvz.org" <xemul@...nvz.org>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Dave Hansen <haveblue@...ibm.com>, ryov@...inux.co.jp
Subject: Re: [PATCH 9/12] memcg allocate all page_cgroup at boot
On Fri, 26 Sep 2008 14:54:22 +0900
Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:
> > Sorry, my brain seems to be sleeping.. above page_mapped() check doesn't
> > help this situation. Maybe this page_mapped() check is not necessary
> > because it's of no use.
> >
> > I think this kind of problem will not be fixed until we handle SwapCache.
> >
> I've not fully understood yet what [12/12] does, but if we handle
> swapcache properly, [12/12] would become unnecessary?
>
Try to illustrate what is trouble more precisely.
in do_swap_page(), page is charged when SwapCache lookup ends.
Here,
- charged when page is not mapped.
- not charged when page is mapped.
set_pte() etc...are done under appropriate lock.
On the other side, when a task exits, zap_pte_range() is called.
It calls page_remove_rmap().
Case A) Following is race.
Thread A Thread B
do_swap_page() zap_pte_range()
(1)try charge (mapcount=1)
(2) page_remove_rmap()
(3) uncharge page.
(4) map it
Then,
at (1), mapcount=1 and this page is not charged.
at (2), page_remove_rmap() is called and mapcount goes down to Zero.
uncharge(3) is called.
at (4), at the end of do_swap_page(), page->mapcount=1 but not charged.
Case B) In another scenario.
Thread A Thread B
do_swap_page() zap_pte_range()
(1)try charge (mapcount=1)
(2) page_remove_rmap()
(3) map it
(4) uncharge is called.
In (4), uncharge is capped but mapcount can go up to 1.
protocol 12/12 is for case (A).
After 12/12, double-check page_mapped() under lock_page_cgroup() will be fix to
case (B).
Huu, I don't like swap-cache ;)
Anyway, we'll have to handle swap cache later.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists