[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190419161118.GA23357@tower.DHCP.thefacebook.com>
Date: Fri, 19 Apr 2019 16:11:23 +0000
From: Roman Gushchin <guro@...com>
To: Oleg Nesterov <oleg@...hat.com>
CC: Roman Gushchin <guroan@...il.com>, Tejun Heo <tj@...nel.org>,
Kernel Team <Kernel-team@...com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
Hi Oleg!
On Fri, Apr 19, 2019 at 05:19:12PM +0200, Oleg Nesterov wrote:
> On 04/05, Roman Gushchin wrote:
> >
> > +void cgroup_leave_frozen(bool always_leave)
> > +{
> > + struct cgroup *cgrp;
> > +
> > + spin_lock_irq(&css_set_lock);
> > + cgrp = task_dfl_cgroup(current);
> > + if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> > + cgroup_dec_frozen_cnt(cgrp);
> > + cgroup_update_frozen(cgrp);
> > + WARN_ON_ONCE(!current->frozen);
> > + current->frozen = false;
> > + }
> > + spin_unlock_irq(&css_set_lock);
> > +
> > + if (unlikely(current->frozen)) {
> > + /*
> > + * If the task remained in the frozen state,
> > + * make sure it won't reach userspace without
> > + * entering the signal handling loop.
> > + */
> > + spin_lock_irq(¤t->sighand->siglock);
> > + recalc_sigpending();
> > + spin_unlock_irq(¤t->sighand->siglock);
>
> I still can't understand this logic.
>
> Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> set then signal_pending() must be already T and we do not need recalc_sigpending?
> If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?
This is paired with cgroup_task_frozen() check in recalc_sigpending_tsk().
If the task is waking from waiting in vfork(), and it races with
unfreezing of the cgroup, we should guarantee that the task won't
return to userspace with task->frozen flag set, otherwise it would break
accounting of frozen tasks.
We can't rely solely on JOBCTL_TRAP_FREEZE bit, as it can be cleared
in parallel at any moment. So we backup it with the task->frozen check.
>
> > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > +{
> > + unsigned long flags;
> > +
> > + /* If the task is about to die, don't bother with freezing it. */
> > + if (!lock_task_sighand(task, &flags))
> > + return;
> > +
> > + if (freeze) {
> > + task->jobctl |= JOBCTL_TRAP_FREEZE;
> > + signal_wake_up(task, false);
> > + } else {
> > + task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > + wake_up_process(task);
>
> wake_up_interruptible() ?
Wait_up_interruptible() is supposed to work with a workqueue,
but here there is nothing like this. Probably, I didn't understand your idea.
Can you, please, elaborate a bit more?
>
> > static int ptrace_signal(int signr, kernel_siginfo_t *info)
> > {
> > /*
> > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> > ksig->info.si_signo = signr = SIGKILL;
> > sigdelset(¤t->pending.signal, SIGKILL);
> > recalc_sigpending();
> > + current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > + spin_unlock_irq(&sighand->siglock);
> > + if (unlikely(cgroup_task_frozen(current)))
> > + cgroup_leave_frozen(true);
>
> Oh, and another leave_frozen below...
Yeah, because of this new "goto fatal" shortcut.
>
> I feel this must be simplified somehow, but nothing comes to my mind right now.
>
> > + /*
> > + * If the task is leaving the frozen state, let's update
> > + * cgroup counters and reset the frozen bit.
> > + */
> > + if (unlikely(cgroup_task_frozen(current))) {
> > spin_unlock_irq(&sighand->siglock);
> > + cgroup_leave_frozen(true);
> > goto relock;
> > }
>
> afaics cgroup_leave_frozen(false) makes more sense here.
Why? I don't see any reasons why the task should remain in the frozen
state after this point. Can you, please, provide an example?
Thank you for looking into it!
Roman
Powered by blists - more mailing lists