lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120228122250.GB1702@cmpxchg.org>
Date:	Tue, 28 Feb 2012 13:22:50 +0100
From:	Johannes Weiner <hannes@...xchg.org>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>, Hugh Dickins <hughd@...gle.com>,
	Greg Thelen <gthelen@...gle.com>, Ying Han <yinghan@...gle.com>
Subject: Re: [PATCH 4/6] memcg: use new logic for page stat accounting.

On Fri, Feb 17, 2012 at 06:27:43PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 5bf592d432e4552db74e94a575afcd83aa652a9d Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> Date: Thu, 2 Feb 2012 11:49:59 +0900
> Subject: [PATCH 4/6] memcg: use new logic for page stat accounting.
> 
> Now, page-stat-per-memcg is recorded into per page_cgroup flag by
> duplicating page's status into the flag. The reason is that memcg
> has a feature to move a page from a group to another group and we
> have race between "move" and "page stat accounting",
> 
> Under current logic, assume CPU-A and CPU-B. CPU-A does "move"
> and CPU-B does "page stat accounting".
> 
> When CPU-A goes 1st,
> 
>             CPU-A                           CPU-B
>                                     update "struct page" info.
>     move_lock_mem_cgroup(memcg)
>     see pc->flags
>     copy page stat to new group
>     overwrite pc->mem_cgroup.
>     move_unlock_mem_cgroup(memcg)
>                                     move_lock_mem_cgroup(mem)
>                                     set pc->flags
>                                     update page stat accounting
>                                     move_unlock_mem_cgroup(mem)
> 
> stat accounting is guarded by move_lock_mem_cgroup() and "move"
> logic (CPU-A) doesn't see changes in "struct page" information.
> 
> But it's costly to have the same information both in 'struct page' and
> 'struct page_cgroup'. And, there is a potential problem.
> 
> For example, assume we have PG_dirty accounting in memcg.
> PG_..is a flag for struct page.
> PCG_ is a flag for struct page_cgroup.
> (This is just an example. The same problem can be found in any
>  kind of page stat accounting.)
> 
> 	  CPU-A                               CPU-B
>       TestSet PG_dirty
>       (delay)                        TestClear PG_dirty
>                                      if (TestClear(PCG_dirty))
>                                           memcg->nr_dirty--
>       if (TestSet(PCG_dirty))
>           memcg->nr_dirty++
> 
> Here, memcg->nr_dirty = +1, this is wrong.
> This race was reported by  Greg Thelen <gthelen@...gle.com>.
> Now, only FILE_MAPPED is supported but fortunately, it's serialized by
> page table lock and this is not real bug, _now_,
> 
> If this potential problem is caused by having duplicated information in
> struct page and struct page_cgroup, we may be able to fix this by using
> original 'struct page' information.
> But we'll have a problem in "move account"
> 
> Assume we use only PG_dirty.
> 
>          CPU-A                   CPU-B
>     TestSet PG_dirty
>     (delay)                    move_lock_mem_cgroup()
>                                if (PageDirty(page))
>                                       new_memcg->nr_dirty++
>                                pc->mem_cgroup = new_memcg;
>                                move_unlock_mem_cgroup()
>     move_lock_mem_cgroup()
>     memcg = pc->mem_cgroup
>     new_memcg->nr_dirty++
> 
> accounting information may be double-counted. This was original
> reason to have PCG_xxx flags but it seems PCG_xxx has another problem.
> 
> I think we need a bigger lock as
> 
>      move_lock_mem_cgroup(page)
>      TestSetPageDirty(page)
>      update page stats (without any checks)
>      move_unlock_mem_cgroup(page)
> 
> This fixes both of problems and we don't have to duplicate page flag
> into page_cgroup. Please note: move_lock_mem_cgroup() is held
> only when there are possibility of "account move" under the system.
> So, in most path, status update will go without atomic locks.
> 
> This patch introduce mem_cgroup_begin_update_page_stat() and
> mem_cgroup_end_update_page_stat() both should be called at modifying
> 'struct page' information if memcg takes care of it. as
> 
>      mem_cgroup_begin_update_page_stat()
>      modify page information
>      mem_cgroup_update_page_stat()
>      => never check any 'struct page' info, just update counters.
>      mem_cgroup_end_update_page_stat().
> 
> This patch is slow because we need to call begin_update_page_stat()/
> end_update_page_stat() regardless of accounted will be changed or not.
> A following patch adds an easy optimization and reduces the cost.
> 
> Changes since v4
>  - removed unused argument *lock
>  - added more comments.
> Changes since v3
>  - fixed typos
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>

Acked-by: Johannes Weiner <hannes@...xchg.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ