linux-kernel - Re: [PATCH RFC] memcg: fix drain_all

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20110810085437.ed023651.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Wed, 10 Aug 2011 08:54:37 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Michal Hocko <mhocko@...e.cz>
Cc:	Johannes Weiner <jweiner@...hat.com>, linux-mm@...ck.org,
	Balbir Singh <bsingharora@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] memcg: fix drain_all_stock crash

On Tue, 9 Aug 2011 13:46:42 +0200
Michal Hocko <mhocko@...e.cz> wrote:

> On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote:
> > On Tue, 9 Aug 2011 12:09:44 +0200
> > Michal Hocko <mhocko@...e.cz> wrote:
> > 
> > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 9 Aug 2011 11:45:03 +0200
> > > > Michal Hocko <mhocko@...e.cz> wrote:
> > > > 
> > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote:
> > > > > > On Tue, 9 Aug 2011 11:31:50 +0200
> > > > > > Michal Hocko <mhocko@...e.cz> wrote:
> > > > > > 
> > > > > > > What do you think about the half backed patch bellow? I didn't manage to
> > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock
> > > > > > > locking (it is acquired somewhere else than it is released which is
> > > > > > > not). I will think about a nicer way how to do it.
> > > > > > > Maybe I should also split the rcu part in a separate patch.
> > > > > > > 
> > > > > > > What do you think?
> > > > > > 
> > > > > > 
> > > > > > I'd like to revert 8521fc50 first and consider total design change
> > > > > > rather than ad-hoc fix.
> > > > > 
> > > > > Agreed. Revert should go into 3.0 stable as well. Although the global
> > > > > mutex is buggy we have that behavior for a long time without any reports.
> > > > > We should address it but it can wait for 3.2.
> > > 
> > > I will send the revert request to Linus.
> > > 
> > > > What "buggy" means here ? "problematic" or "cause OOps ?"
> > > 
> > > I have described that in an earlier email. Consider pathological case
> > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while
> > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has
> > > to reclaim although we could prevent from it by getting rid of cached
> > > charges. This is not critical though.
> > > 
> > 
> > That problem should be fixed by background reclaim.
> 
> How? Do you plan to rework locking or the charge caching completely?
> 

>From your description, the problem is not the lock itself but a task
may go into _unnecessary_ direct-reclaim even if there are remaining
chages on per-cpu stocks, which cause latency.

In (all) my automatic background reclaim tests, no direct reclaim happens
if background reclaim is enabled. And as I said before, we may be able to
add a flag not to cache more. It's set by some condition ....as usage is
near to the limit.

Thanks,
-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/