[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081216122350.GB25019@elte.hu>
Date: Tue, 16 Dec 2008 13:23:51 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Li Zefan <lizf@...fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Paul Menage <menage@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: fix another race when reading /proc/sched_debug
* Li Zefan <lizf@...fujitsu.com> wrote:
> Li Zefan wrote:
> >> i merged it up in tip/master, could you please check whether it's ok?
> >>
> >
> > Sorry, though this patch avoids accessing a half-created cgroup, but I found
> > current code may access a cgroup which has been destroyed.
> >
> > The simplest fix is to take cgroup_lock() before for_each_leaf_cfs_rq.
> >
> > Could you revert this patch and apply the following new one? My box has
> > survived for 16 hours with it applied.
> >
>
> Hi, Ingo
>
> Can we have this bug fixed for 2.6.28 using this patch ? This patch is
> the simplest fix and has been fully tested.
the mutex used by cgroup_lock() is pretty crappy to nest inside the
runqueue lock:
BUG: sleeping function called from invalid context at kernel/mutex7
in_atomic(): 0, irqs_disabled(): 1, pid: 1790, name: cat
2 locks held by cat/1790:
#0: (&p->lock){--..}, at: [<c02a6328>] seq_read+0x25/0x2b8
#1: (tasklist_lock){..--}, at: [<c021f5bc>]
sched_debug_show+0xaa7/0xe90
Call Trace:
[<c0222612>] __might_sleep+0xd6/0xdd
[<c05ff1eb>] mutex_lock_nested+0x1d/0x245
[<c021f88b>] ? sched_debug_show+0xd76/0xe90
[<c0256223>] cgroup_lock+0xf/0x11
[<c021f893>] sched_debug_show+0xd7e/0xe90
[<c0248696>] ? __lock_acquire+0x637/0x69d
[<c028e7cd>] ? check_object+0x111/0x18c
[<c028f435>] ? kmem_cache_alloc+0x70/0xa5
[<c02a6355>] ? seq_read+0x52/0x2b8
[<c0246ee2>] ? trace_hardirqs_on_caller+0x105/0x13d
[<c0246f25>] ? trace_hardirqs_on+0xb/0xd
[<c02a6355>] ? seq_read+0x52/0x2b8
[<c02a63f7>] seq_read+0xf4/0x2b8
[<c02a6303>] ? seq_read+0x0/0x2b8
[<c02c411a>] proc_reg_read+0x60/0x74
so i had to remove your patch.
also, while looking at what cgroup_lock does - it's a trivial wrapper:
void cgroup_lock(void)
{
mutex_lock(&cgroup_mutex);
}
why isnt that done explicitly in all usage sites? If cgroup-unaware code
ever has to take the cgroup lock outside of CONFIG_CGROUPS, that's a code
structure problem.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists