[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABAubTg91qrUd4DO7T2SiJQBK9ypuhP0+F-091ZxtmonjaaYWg@mail.gmail.com>
Date: Tue, 12 Jul 2016 08:35:06 -0700
From: Shayan Pooya <shayan@...eve.org>
To: Michal Hocko <mhocko@...nel.org>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
koct9i@...il.com
Cc: cgroups mailinglist <cgroups@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: bug in memcg oom-killer results in a hung syscall in another
process in the same cgroup
>> With strace, when running 500 concurrent mem-hog tasks on the same
>> kernel, 33 of them failed with:
>>
>> strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion
>> `THREAD_GETMEM (self, tid) != ppid' failed.
>>
>> Which is: https://sourceware.org/bugzilla/show_bug.cgi?id=15392
>> And discussed before at: https://lkml.org/lkml/2015/2/6/470 but that
>> patch was not accepted.
>
> OK, so the problem is that the oom killed task doesn't report the futex
> release properly? If yes then I fail to see how that is memcg specific.
> Could you try to clarify what you consider a bug again, please? I am not
> really sure I understand this report.
It looks like it is just a very easy way to reproduce the problem that
Konstantin described in that lkml thread. That patch was not accepted
and I see no other fixes for that issue upstream. Here is a copy of
his root-cause analysis from said thread:
Whole sequence looks like: task calls fork, glibc calls syscall clone with
CLONE_CHILD_SETTID and passes pointer to TLS THREAD_SELF->tid as argument.
Child task gets read-only copy of VM including TLS. Child calls put_user()
to handle CLONE_CHILD_SETTID from schedule_tail(). put_user() trigger page
fault and it fails because do_wp_page() hits memcg limit without invoking
OOM-killer because this is page-fault from kernel-space. Put_user returns
-EFAULT, which is ignored. Child returns into user-space and catches here
assert (THREAD_GETMEM (self, tid) != ppid), glibc tries to print something
but hangs on deadlock on internal locks. Halt and catch fire.
Regards
Powered by blists - more mailing lists