lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 8 Mar 2012 15:33:31 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, axboe@...nel.dk,
	hughd@...gle.com, avi@...hat.com, nate@...nel.net,
	cl@...ux-foundation.org, linux-kernel@...r.kernel.org,
	dpshah@...gle.com, ctalbott@...gle.com, rni@...gle.com
Subject: Re: [PATCHSET] mempool, percpu, blkcg: fix percpu stat allocation
 and remove stats_lock

On Thu, Mar 08, 2012 at 03:16:16PM -0500, Vivek Goyal wrote:
> On Thu, Mar 08, 2012 at 10:08:33AM -0800, Tejun Heo wrote:
> 
> [..]
> > > Tejun, I noticed that in UP case, once in a while cgroup removal is
> > > hanging. Looks like it is hung in cgroup_rmdir() somewhere. I will debug
> > > more to find out what is happening. May be blkcg->refcount issue.
> > 
> > It's probably from something forgetting to put cgroup and pre_destroy
> > waiting for it.  Such bugs would have been masked before but now show
> > up as stalls during rmdir.
> 
> I am not sure what is happening here yet. What I have noticed that
> somebody is holding a reference on blkg->refcnt and that's why css->refcnt
> is not zero hence rmdir is hanging.
> 
> I susect it is cfqq refcount on blkg which is not released till cfqq is
> reclaimed.
> 
> Looking at the code, in general it seems to be a problem. If a task 
> issues bunch of IO, changes the cgroup and does not issue IO any more
> for some time, that means old cfqq will still be linked to task's
> cic and still be holding reference to blkg and one can't remove the
> cgroup.
> 
> We had this disucssion in the past. So looks like to get rid of this
> problem, you will have to drop old cic->cfqq association during
> cgroup change to avoid hanging rmdir.

Ok, I can confirm that it is cfqq reference on blkg which is an issue. If
I move my shell to a child cgroup and try to do some operations (in the
context of shell, like autocompletion/reading an uncached dir), then IO
is issued in the context of shell, I move out the shell out of cgroup and
then try to delete it, it hangs. Once I exit out of shell, blkg reference
is dropped and cgroup is deleted.

So we do need to cleanup the cic->cfqq upon cgroup change synchronously.

That will still not solve the issue of a process dumping tons of
IO on device (large nr_requests) and then moving out of cgroup. Now
cgroup deletion will still hang till all the IO in the cgroup
completes.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ