lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.NEB.2.01.1009100845090.28193@jrf.vwaro.pbz>
Date:	Fri, 10 Sep 2010 08:51:51 +0100 (BST)
From:	Mark Hills <mark@...o.org.uk>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
cc:	Peter Zijlstra <peterz@...radead.org>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	linux-kernel@...r.kernel.org
Subject: Re: cgroup: rmdir() does not complete

On Fri, 10 Sep 2010, KAMEZAWA Hiroyuki wrote:

> On Fri, 10 Sep 2010 08:28:00 +0100 (BST)
> Mark Hills <mark@...o.org.uk> wrote:
> 
> > On Fri, 10 Sep 2010, KAMEZAWA Hiroyuki wrote:
> > 
> > > On Fri, 10 Sep 2010 00:04:31 +0100 (BST)
> > > Mark Hills <mark@...o.org.uk> wrote:
> > > > The report on the spinning process (23586) is dominated by calls from 
> > > > mem_cgroup_force_empty.
> > > > 
> > > > It seems to show lru_add_drain_all and drain_all_stock_sync are causing 
> > > > the load (I assume drain_all_stock_sync has been optimised out). But I 
> > > > don't think this is as important as what causes the spin.
> > > > 
> > > 
> > > I noticed you use FUSE and it seems there is a problem in FUSE v.s. memcg.
> > > I wrote a patch (onto 2.6.36 but can be applied..)
> > > 
> > > Could you try this ? I'm sorry I don't use FUSE system and can't test
> > > right now.
> > 
> > What makes you conclude that FUSE is in use? I do not think this is the 
> > case. Or do you mean that it is a problem that the kernel is built with 
> > FUSE support?
> > 
> You wrote 
> > The test case I was running is similar to the above. With the Lustre 
> > filesystem the problem takes 4 hours or more to show itself. Recently I 
> > ran 4 threads for over 24 hours without it being seen -- I suspect some 
> > external factor is involved.
> 
> I think Lustre FS is using FUSE. I'm wrong ?

Lustre does not use FUSE. But the client is a set of kernel modules, so 
these could do anything.

> > I _can_ test the patch, but I still cannot reliably reproduce the problem 
> > so it will be hard to conclude whether the patch works or not. Is there a 
> > way to build a test case for this?
> > 
> 
> I'm sorry I'm not sure yet. But from your report, you have 6 pages of charge
> which cannot be found by force_empty(). And I found FUSE's pipe copy code
> inserts a page cache into radix-tree but not move them onto LRU.
> 
> So,
>   - There are remaining pages which is out-of-LRU
>   - FUSE's "move" code does something curious, add_to_page_cache() but not LRU.
>   - You reporeted you use Lustre FS.
> 
> Then, I ask you. To test this, I have to study FUSE to write test module...
> Maybe adding printk() to where I added gfp_mask modification of fuse/dev.c
> can show something but...
> 
> We may have something other problem, but it seems this is one of them.

Okay, it sounds like perhaps I need to investigate Lustre, I will do this 
next week. But I think FUSE can be ruled out.

Thanks again

-- 
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ