linux-kernel - Re: 3.10.16 cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ75kXZYjKwV_XiEB493jNyGRqS395JZyY-S9xQBQJLyaCSOEQ@mail.gmail.com>
Date:	Fri, 22 Nov 2013 21:59:37 +0100
From:	William Dauchy <wdauchy@...il.com>
To:	Hugh Dickins <hughd@...gle.com>, Tejun Heo <tj@...nel.org>
Cc:	Shawn Bohrer <shawn.bohrer@...il.com>,
	Michal Hocko <mhocko@...e.cz>, Li Zefan <lizefan@...wei.com>,
	cgroups@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Markus Blank-Burian <burian@...nster.de>
Subject: Re: 3.10.16 cgroup_mutex deadlock

On Mon, Nov 18, 2013 at 3:17 AM, Hugh Dickins <hughd@...gle.com> wrote:
> Sorry for the delay: I was on the point of reporting success last
> night, when I tried a debug kernel: and that didn't work so well
> (got spinlock bad magic report in pwd_adjust_max_active(), and
> tests wouldn't run at all).
>
> Even the non-early cgroup_init() is called well before the
> early_initcall init_workqueues(): though only the debug (lockdep
> and spinlock debug) kernel appeared to have a problem with that.
>
> Here's the patch I ended up with successfully on a 3.11.7-based
> kernel (though below I've rediffed it against 3.11.8): the
> schedule_work->queue_work hunks are slightly different on 3.11
> than in your patch against current, and I did alloc_workqueue()
> from a separate core_initcall.
>
> The interval between cgroup_init and that is a bit of a worry;
> but we don't seem to have suffered from the interval between
> cgroup_init and init_workqueues before (when system_wq is NULL)
> - though you may have more courage than I to reorder them!
>
> Initially I backed out my system_highpri_wq workaround, and
> verified that it was still easy to reproduce the problem with
> one of our cgroup stresstests.  Yes it was, then your modified
> patch below convincingly fixed it.
>
> I ran with Johannes's patch adding extra mem_cgroup_reparent_charges:
> as I'd expected, that didn't solve this issue (though it's worth
> our keeping it in to rule out another source of problems).  And I
> checked back on dumps of failures: they indeed show the tell-tale
> 256 kworkers doing cgroup_offline_fn, just as you predicted.

Hugh, Tejun,

Do we have some news about this patch? I'm also hitting this bug on a 3.10.x

Thanks,
-- 
William
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/