linux-kernel - Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190422221116.GA10341@tower.DHCP.thefacebook.com>
Date:   Mon, 22 Apr 2019 22:11:22 +0000
From:   Roman Gushchin <guro@...com>
To:     Oleg Nesterov <oleg@...hat.com>
CC:     Roman Gushchin <guroan@...il.com>, Tejun Heo <tj@...nel.org>,
        Kernel Team <Kernel-team@...com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer

On Sat, Apr 20, 2019 at 12:58:38PM +0200, Oleg Nesterov wrote:
> On 04/19, Roman Gushchin wrote:
> >
> > > > >
> > > > > wake_up_interruptible() ?
> > > >
> > > > Wait_up_interruptible() is supposed to work with a workqueue,
> > > > but here there is nothing like this. Probably, I didn't understand your idea.
> > > > Can you, please, elaborate a bit more?
> > >
> > > Not sure I understand... We need to wake up the task if it sleeps in
> > > do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
> > > why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?
> >
> > Right, but __wake_up is supposed to wake threads blocked on a waitqueue:
> 
> Ugh sorry ;) of course I meant wake_up_state(task, TASK_INTERRUPTIBLE).

Agh, then it makes total sense to me. I'll master a follow-up patch.

> 
> > > > > > +		if (unlikely(cgroup_task_frozen(current))) {
> > > > > >  			spin_unlock_irq(&sighand->siglock);
> > > > > > +			cgroup_leave_frozen(true);
> > > > > >  			goto relock;
> > > > > >  		}
> > > > >
> > > > > afaics cgroup_leave_frozen(false) makes more sense here.
> > > >
> > > > Why? I don't see any reasons why the task should remain in the frozen
> > > > state after this point.
> > >
> > > But cgroup_leave_frozen(false) will equally clear ->frozen if !CGRP_FREEZE ?
> > > OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?
> >
> > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > If a user changed the desired state of cgroup twice, there is no need to avoid
> > state transitions. Or maybe I don't see it yet.
> 
> Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> does it differ from get_signal() ?

We need it because sleeping in vfork is a special state which we want to
account as frozen. And if the parent process wakes up while the cgroup is frozen
(because of the child death, for example), we want to push it into the "proper"
frozen state without changing the state of the cgroup.

> 
> If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> 
> This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> calls get_signal().
> 
> get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> cgroup_leave_frozen(true) which clears ->frozen.
> 
> Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> mode?

Got it, a good catch! So if the freezer races with vfork() completion, we might
have a spurious frozen->unfrozen->frozen transition of the cgroup state.

Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
concerned that we're basically putting the task in a busy loop between
the setting CGRP_FREEZE and setting TRAP_FREEZE. Do you think it's ok?
I wonder if there are better solutions.

Thank you!