linux-kernel - Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181113215919.GC15590@tower.DHCP.thefacebook.com>
Date:   Tue, 13 Nov 2018 21:59:23 +0000
From:   Roman Gushchin <guro@...com>
To:     Oleg Nesterov <oleg@...hat.com>
CC:     Roman Gushchin <guroan@...il.com>, Tejun Heo <tj@...nel.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

Hi Oleg!

On Tue, Nov 13, 2018 at 04:48:25PM +0100, Oleg Nesterov wrote:
> On 11/12, Roman Gushchin wrote:
> >
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -83,7 +83,8 @@ struct task_group;
> >  #define TASK_WAKING			0x0200
> >  #define TASK_NOLOAD			0x0400
> >  #define TASK_NEW			0x0800
> > -#define TASK_STATE_MAX			0x1000
> > +#define TASK_FROZEN			0x1000
> > +#define TASK_STATE_MAX			0x2000
> 
> Just noticed the new task state... Why? Can't we avoid it?

We can, but it's nice to show to userspace that tasks are frozen,
rather than just stuck somewhere in the kernel...

> 
> ...
> 
> > +void cgroup_freezer_enter(void)
> > +{
> > +	long state = current->state;
> 
> Why? it must be TASK_RUNNING?
> 
> If not set_current_state() at the end is simply wrong... Yes, __refrigerator()
> does this, but at least it has a reason although it is wrong too.
> 
> > +	struct cgroup *cgrp;
> > +
> > +	if (!current->frozen) {
> > +		spin_lock_irq(&css_set_lock);
> > +		current->frozen = true;
> > +		cgrp = task_dfl_cgroup(current);
> > +		cgrp->freezer.nr_frozen_tasks++;
> > +
> > +		WARN_ON_ONCE(cgrp->freezer.nr_tasks_to_freeze <
> > +			     cgrp->freezer.nr_frozen_tasks);
> > +
> > +		if (cgrp->freezer.nr_tasks_to_freeze ==
> > +		    cgrp->freezer.nr_frozen_tasks)
> > +			cgroup_queue_notify_frozen(cgrp);
> > +		spin_unlock_irq(&css_set_lock);
> > +	}
> > +
> > +	/* refrigerator */
> > +	set_current_state(TASK_WAKEKILL | TASK_INTERRUPTIBLE | TASK_FROZEN);
> 
> Why not __set_current_state() ?

Hm, it's not a hot path at all, so set_current_state() is good enough.
Not a strong preference, of course.

> 
> If ->state include TASK_INTERRUPTIBLE, why do we need TASK_WAKEKILL?
> 
> And again, why TASK_FROZEN?

So, should it be just TASK_INTERRUPTIBLE | TASK_FROZEN ?

> 
> > +	clear_thread_flag(TIF_SIGPENDING);
> > +	schedule();
> > +	recalc_sigpending();
> 
> I simply can't understand these 3 lines above but I bet this is not correct ;)

So, yeah, the problem is that if there is TIF_SIGPENDING bit set, schedule()
will return immediately, so we're getting pretty much a busy loop here.
This is a nasty workaround.

I believe we can clear and not call recalc_sigpending() at all. Does this seem
to be correct?

> 
> if nothing else recalc_sigpending() without ->siglock is wrong, it can race
> with signal_wakeup/etc.
> 
> > +	set_current_state(state);
> 
> see above...

Thank you for the review!
And looking forward for more comments from you!