linux-kernel - Re: cgroup: rmdir() does not complete

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20100910163354.1f719c0d.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Fri, 10 Sep 2010 16:33:54 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Mark Hills <mark@...o.org.uk>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	linux-kernel@...r.kernel.org
Subject: Re: cgroup: rmdir() does not complete

On Fri, 10 Sep 2010 08:28:00 +0100 (BST)
Mark Hills <mark@...o.org.uk> wrote:

> On Fri, 10 Sep 2010, KAMEZAWA Hiroyuki wrote:
> 
> > On Fri, 10 Sep 2010 00:04:31 +0100 (BST)
> > Mark Hills <mark@...o.org.uk> wrote:
> > > The report on the spinning process (23586) is dominated by calls from 
> > > mem_cgroup_force_empty.
> > > 
> > > It seems to show lru_add_drain_all and drain_all_stock_sync are causing 
> > > the load (I assume drain_all_stock_sync has been optimised out). But I 
> > > don't think this is as important as what causes the spin.
> > > 
> > 
> > I noticed you use FUSE and it seems there is a problem in FUSE v.s. memcg.
> > I wrote a patch (onto 2.6.36 but can be applied..)
> > 
> > Could you try this ? I'm sorry I don't use FUSE system and can't test
> > right now.
> 
> What makes you conclude that FUSE is in use? I do not think this is the 
> case. Or do you mean that it is a problem that the kernel is built with 
> FUSE support?
> 
You wrote 
> The test case I was running is similar to the above. With the Lustre 
> filesystem the problem takes 4 hours or more to show itself. Recently I 
> ran 4 threads for over 24 hours without it being seen -- I suspect some 
> external factor is involved.

I think Lustre FS is using FUSE. I'm wrong ?


> I _can_ test the patch, but I still cannot reliably reproduce the problem 
> so it will be hard to conclude whether the patch works or not. Is there a 
> way to build a test case for this?
> 

I'm sorry I'm not sure yet. But from your report, you have 6 pages of charge
which cannot be found by force_empty(). And I found FUSE's pipe copy code
inserts a page cache into radix-tree but not move them onto LRU.

So,
  - There are remaining pages which is out-of-LRU
  - FUSE's "move" code does something curious, add_to_page_cache() but not LRU.
  - You reporeted you use Lustre FS.

Then, I ask you. To test this, I have to study FUSE to write test module...
Maybe adding printk() to where I added gfp_mask modification of fuse/dev.c
can show something but...

We may have something other problem, but it seems this is one of them.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/