[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1292486036.10931.314.camel@maggy.simson.net>
Date: Thu, 16 Dec 2010 08:53:56 +0100
From: Mike Galbraith <efault@....de>
To: Oleg Nesterov <oleg@...hat.com>
Cc: linux-tip-commits@...r.kernel.org, linux-kernel@...r.kernel.org,
hpa@...or.com, mingo@...hat.com, mathieu.desnoyers@...icios.com,
a.p.zijlstra@...llo.nl, torvalds@...ux-foundation.org,
pjt@...gle.com, markus@...ppelsdorf.de, tglx@...utronix.de,
mingo@...e.hu
Subject: Re: [tip:sched/core] sched: Add 'autogroup' scheduling feature:
automated per session task groups
On Wed, 2010-12-15 at 18:50 +0100, Oleg Nesterov wrote:
> I assume this is the latest version. In this case I think it needs
> minor fixes.
>
> > +#ifdef CONFIG_PROC_FS
> > +
> > +/* Called with siglock held. */
>
> This is not true, and that is why we can't blindly use kref_get().
I was going to lock it all up, but convinced myself it wasn't necessary.
The comment should have also gone away.
> > +int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
> > +{
> > + static unsigned long next = INITIAL_JIFFIES;
> > + struct autogroup *ag;
> > + int err;
> > +
> > + if (*nice < -20 || *nice > 19)
> > + return -EINVAL;
> > +
> > + err = security_task_setnice(current, *nice);
> > + if (err)
> > + return err;
> > +
> > + if (*nice < 0 && !can_nice(current, *nice))
> > + return -EPERM;
> > +
> > + /* this is a heavy operation taking global locks.. */
> > + if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
> > + return -EAGAIN;
> > +
> > + next = HZ / 10 + jiffies;
> > + ag = autogroup_kref_get(p->signal->autogroup);
>
> We can race with autogroup_move_group() and use the already freed
> ->autogroup. We need ->siglock or task_rq_lock() to read it.
I don't see how/why. I took a reference to the new group before
assignment of p->signal->autogroup, and put the previous group after
it's assigned.
Ponders that.. uhoh.
Mover does atomic write, but signal->autogroup write comes after that,
so can still be in flight when reader dereferences. Game over unless
the reader beats ->autogroup writer to the punch.
Thanks again for your excellent eyeballs. The below should plug that
hole, no? (hope so, seems pointless to lock movement)
> IOW, I think we need something like the patch below, but - sorry -
> if was completely untested.
>
> And the question,
>
> > + down_write(&ag->lock);
> > + err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
>
> Do we really want this if ag == autogroup_default ? Say, autogroup_create()
> fails, now the owner of this process can affect init_task_group. Or admin
> can change init_task_group "by accident" (although currently this is hardly
> possible, sched_autogroup_detach() has no callers). Just curious.
sched_group_set_shares() does the right thing, says no to changing the
root task group's shares.
sched: fix potential access to freed memory
Oleg pointed out that the /proc interface kref_get() useage may race with
the final put during autogroup_move_group(). A signal->autogroup assignment
may be in flight when the /proc interface dereference, leaving them taking
a reference to an already dead group.
Reported-by: Oleg Nesterov <oleg@...hat.com>
Signed-off-by: Mike Galbraith <efault@....de>
diff --git a/kernel/sched_autogroup.c b/kernel/sched_autogroup.c
index 57a7ac2..713b6c0 100644
--- a/kernel/sched_autogroup.c
+++ b/kernel/sched_autogroup.c
@@ -41,6 +41,12 @@ static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
return ag;
}
+static inline struct autogroup *autogroup_task_get(struct task_struct *p)
+{
+ smp_rmb();
+ return autogroup_kref_get(p->signal->autogroup);
+}
+
static inline struct autogroup *autogroup_create(void)
{
struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
@@ -119,6 +125,7 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
}
p->signal->autogroup = autogroup_kref_get(ag);
+ smp_mb();
t = p;
do {
@@ -172,7 +179,6 @@ __setup("noautogroup", setup_autogroup);
#ifdef CONFIG_PROC_FS
-/* Called with siglock held. */
int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
{
static unsigned long next = INITIAL_JIFFIES;
@@ -194,7 +200,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
return -EAGAIN;
next = HZ / 10 + jiffies;
- ag = autogroup_kref_get(p->signal->autogroup);
+ ag = autogroup_task_get(p);
down_write(&ag->lock);
err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
@@ -209,7 +215,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
{
- struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
+ struct autogroup *ag = autogroup_task_get(p);
down_read(&ag->lock);
seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists