linux-kernel - Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190419161118.GA23357@tower.DHCP.thefacebook.com>
Date:   Fri, 19 Apr 2019 16:11:23 +0000
From:   Roman Gushchin <guro@...com>
To:     Oleg Nesterov <oleg@...hat.com>
CC:     Roman Gushchin <guroan@...il.com>, Tejun Heo <tj@...nel.org>,
        Kernel Team <Kernel-team@...com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer

Hi Oleg!

On Fri, Apr 19, 2019 at 05:19:12PM +0200, Oleg Nesterov wrote:
> On 04/05, Roman Gushchin wrote:
> >
> > +void cgroup_leave_frozen(bool always_leave)
> > +{
> > +	struct cgroup *cgrp;
> > +
> > +	spin_lock_irq(&css_set_lock);
> > +	cgrp = task_dfl_cgroup(current);
> > +	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> > +		cgroup_dec_frozen_cnt(cgrp);
> > +		cgroup_update_frozen(cgrp);
> > +		WARN_ON_ONCE(!current->frozen);
> > +		current->frozen = false;
> > +	}
> > +	spin_unlock_irq(&css_set_lock);
> > +
> > +	if (unlikely(current->frozen)) {
> > +		/*
> > +		 * If the task remained in the frozen state,
> > +		 * make sure it won't reach userspace without
> > +		 * entering the signal handling loop.
> > +		 */
> > +		spin_lock_irq(&current->sighand->siglock);
> > +		recalc_sigpending();
> > +		spin_unlock_irq(&current->sighand->siglock);
> 
> I still can't understand this logic.
> 
> Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> set then signal_pending() must be already T and we do not need recalc_sigpending?
> If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?

This is paired with cgroup_task_frozen() check in recalc_sigpending_tsk().
If the task is waking from waiting in vfork(), and it races with
unfreezing of the cgroup, we should guarantee that the task won't
return to userspace with task->frozen flag set, otherwise it would break
accounting of frozen tasks.
We can't rely solely on JOBCTL_TRAP_FREEZE bit, as it can be cleared
in parallel at any moment. So we backup it with the task->frozen check.

> 
> > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > +{
> > +	unsigned long flags;
> > +
> > +	/* If the task is about to die, don't bother with freezing it. */
> > +	if (!lock_task_sighand(task, &flags))
> > +		return;
> > +
> > +	if (freeze) {
> > +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> > +		signal_wake_up(task, false);
> > +	} else {
> > +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		wake_up_process(task);
> 
> wake_up_interruptible() ?

Wait_up_interruptible() is supposed to work with a workqueue,
but here there is nothing like this. Probably, I didn't understand your idea.
Can you, please, elaborate a bit more?

> 
> >  static int ptrace_signal(int signr, kernel_siginfo_t *info)
> >  {
> >  	/*
> > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> >  		ksig->info.si_signo = signr = SIGKILL;
> >  		sigdelset(&current->pending.signal, SIGKILL);
> >  		recalc_sigpending();
> > +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		spin_unlock_irq(&sighand->siglock);
> > +		if (unlikely(cgroup_task_frozen(current)))
> > +			cgroup_leave_frozen(true);
> 
> Oh, and another leave_frozen below...

Yeah, because of this new "goto fatal" shortcut.

> 
> I feel this must be simplified somehow, but nothing comes to my mind right now.
> 
> > +		/*
> > +		 * If the task is leaving the frozen state, let's update
> > +		 * cgroup counters and reset the frozen bit.
> > +		 */
> > +		if (unlikely(cgroup_task_frozen(current))) {
> >  			spin_unlock_irq(&sighand->siglock);
> > +			cgroup_leave_frozen(true);
> >  			goto relock;
> >  		}
> 
> afaics cgroup_leave_frozen(false) makes more sense here.

Why? I don't see any reasons why the task should remain in the frozen
state after this point. Can you, please, provide an example?

Thank you for looking into it!

Roman