lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100318094519.cd1eed72.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Thu, 18 Mar 2010 09:45:19 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	balbir@...ux.vnet.ibm.com, Andrea Righi <arighi@...eler.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Vivek Goyal <vgoyal@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Trond Myklebust <trond.myklebust@....uio.no>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Greg Thelen <gthelen@...gle.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Andrew Morton <akpm@...ux-foundation.org>,
	containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Thu, 18 Mar 2010 08:54:11 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:

> On Wed, 17 Mar 2010 17:28:55 +0530
> Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> 
> > * Andrea Righi <arighi@...eler.com> [2010-03-15 00:26:38]:
> > 
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > > 
> > > Now, file-mapped is maintaiend. But more generic update function
> > > will be needed for dirty page accounting.
> > > 
> > > For accountig page status, we have to guarantee lock_page_cgroup()
> > > will be never called under tree_lock held.
> > > To guarantee that, we use trylock at updating status.
> > > By this, we do fuzzy accounting, but in almost all case, it's correct.
> > >
> > 
> > I don't like this at all, but in almost all cases is not acceptable
> > for statistics, since decisions will be made on them and having them
> > incorrect is really bad. Could we do a form of deferred statistics and
> > fix this.
> > 
> 
> plz show your implementation which has no performance regresssion.
> For me, I don't neee file_mapped accounting, at all. If we can remove that,
> we can add simple migration lock.
> file_mapped is a feattue you added. please improve it.
> 

BTW, I should explain how acculate this accounting is in this patch itself.

Now, lock_page_cgroup/unlock_page_cgroup happens when
	- charge/uncharge/migrate/move accounting

Then, the lock contention (trylock failure) seems to occur in conflict
with
	- charge, uncharge, migarate. move accounting

About dirty accounting, charge/uncharge/migarate are operation in synchronous
manner with radix-tree (holding treelock etc). Then no account leak.
move accounting is only source for inacculacy...but I don't think this move-task
is ciritial....moreover, we don't move any file pages at task-move, now.
(But Nishimura-san has a plan to do so.)
So, contention will happen only at confliction with force_empty.

About FILE_MAPPED accounting, it's not synchronous with radix-tree operaton.
Then, accounting-miss seems to happen when charge/uncharge/migrate/account move.
But
	charge .... we don't account a page as FILE_MAPPED before it's charged.
	uncharge .. usual operation in turncation is unmap->remove-from-radix-tree.
		    Then, it's sequential in almost all case. The race exists when...
		    Assume there are 2 threads A and B. A truncate a file, B map/unmap that.
		    This is very unusal confliction.
	migrate.... we do try_to_unmap before migrating pages. Then, FILE_MAPPED
		    is properly handled.
	move account .... we don't have move-account-mapped-file, yet.

Then, this trylock contention happens at contention with force_empty and truncate.


Then, main issue for contention is force_empty. But it's called for removing memcg,
accounting for such memcg is not important.
Then, I say "this accounting is Okay."

To do more accurate, we may need another "migration lock". But to get better
performance for root cgroup, we have to call mem_cgroup_is_root() before
taking lock and there will be another complicated race.

Bye,
-Kame












--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ