lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 3 Feb 2011 00:32:51 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	eranian@...gle.com, linux-kernel@...r.kernel.org, mingo@...e.hu,
	paulus@...ba.org, davem@...emloft.net, fweisbec@...il.com,
	perfmon2-devel@...ts.sf.net, eranian@...il.com,
	robert.richter@....com, acme@...hat.com, lizf@...fujitsu.com,
	Paul Menage <menage@...gle.com>
Subject: Re: [PATCH 1/2] perf_events: add cgroup support (v8)

* Peter Zijlstra <peterz@...radead.org> [2011-02-02 13:46:32]:

> On Wed, 2011-02-02 at 17:20 +0530, Balbir Singh wrote:
> > * Peter Zijlstra <peterz@...radead.org> [2011-02-02 12:29:20]:
> > 
> > > On Thu, 2011-01-20 at 15:39 +0100, Peter Zijlstra wrote:
> > > > On Thu, 2011-01-20 at 15:30 +0200, Stephane Eranian wrote:
> > > > > @@ -4259,8 +4261,20 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
> > > > >  
> > > > >         /* Reassign the task to the init_css_set. */
> > > > >         task_lock(tsk);
> > > > > +       /*
> > > > > +        * we mask interrupts to prevent:
> > > > > +        * - timer tick to cause event rotation which
> > > > > +        *   could schedule back in cgroup events after
> > > > > +        *   they were switched out by perf_cgroup_sched_out()
> > > > > +        *
> > > > > +        * - preemption which could schedule back in cgroup events
> > > > > +        */
> > > > > +       local_irq_save(flags);
> > > > > +       perf_cgroup_sched_out(tsk);
> > > > >         cg = tsk->cgroups;
> > > > >         tsk->cgroups = &init_css_set;
> > > > > +       perf_cgroup_sched_in(tsk);
> > > > > +       local_irq_restore(flags);
> > > > >         task_unlock(tsk);
> > > > >         if (cg)
> > > > >                 put_css_set_taskexit(cg); 
> > > > 
> > > > So you too need a callback on cgroup change there.. Li, Paul, any chance
> > > > we can fix this cgroup_subsys::exit callback? The scheduler code needs
> > > > to do funny thing because its in the wrong place as well.
> > > 
> > > cgroup guys? Shall I just fix this exit thing since the only user seems
> > > to be the scheduler and now perf for both of which its unfortunate at
> > > best?
> > 
> > Are you suggesting that the cgroup_exit on task_exit notification should be
> > pulled out?
> 
> 
> No, just fixed. The callback as it exists isn't useful and leads to
> hacks like the above.
>

OK
 
> 
> > > Balbir, memcontrol.c uses pre_destroy(), I pose that using this method
> > > is broken per definition since it makes the cgroup empty notification
> > > void.
> > >
> > 
> > We use pre_destroy() to reclaim, so that delete/rmdir() will be able
> > to clean up the node/group. I am not sure what you mean by it makes
> > the empty notification void and why pre_destroy() is broken?
> 
> A quick look at the code looked like it could return -EBUSY (and other
> errors), in that case the rmdir of the empty cgroup will fail.
> 
> Therefore it can happen that after the last task is removed, and we get
> the notification that the cgroup is empty, and we attempt the rmdir we
> will fail.
> 
> This again means that all such notification handlers must poll state,
> which is ridiculous.

The reason why the failure occurs is because someone has an active
reference to the cgroup structure. In the case of memory, it was every
page_cgroup earlier. The only reason why a notification would have to
poll state is if

1. notification is sent that there are no references, this group can
be cleaned up
2. A new reference is acquired before the cleanup

1 and 2 are unlikely


-- 
	Three Cheers,
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ