linux-kernel - Re: process hangs on do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKTCnzkMQQXRdx=ikydsD9Pm3LuRgf45_=m7ozuFmSZyxazXyA@mail.gmail.com>
Date:	Tue, 23 Oct 2012 10:10:42 +0530
From:	Balbir Singh <bsingharora@...il.com>
To:	Qiang Gao <gaoqiangscut@...il.com>
Cc:	Michal Hocko <mhocko@...e.cz>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	linux-mm@...ck.org
Subject: Re: process hangs on do_exit when oom happens

On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <gaoqiangscut@...il.com> wrote:
> information about the system is in the attach file "information.txt"
>
> I can not reproduce it in the upstream 3.6.0 kernel..
>
> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <mhocko@...e.cz> wrote:
>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>> I looked up nothing useful with google,so I'm here for help..
>>>
>>> when this happens:  I use memcg to limit the memory use of a
>>> process,and when the memcg cgroup was out of memory,
>>> the process was oom-killed   however,it cannot really complete the
>>> exiting. here is the some information
>>
>> How many tasks are in the group and what kind of memory do they use?
>> Is it possible that you were hit by the same issue as described in
>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>
>>> OS version:  centos6.2    2.6.32.220.7.1
>>
>> Your kernel is quite old and you should be probably asking your
>> distribution to help you out. There were many fixes since 2.6.32.
>> Are you able to reproduce the same issue with the current vanila kernel?
>>
>>> /proc/pid/stack
>>> ---------------------------------------------------------------
>>>
>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>>> [<ffffffff8105b078>] mmput+0x58/0x110
>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> This looks strange because this is just an exit part which shouldn't
>> deadlock or anything. Is this stack stable? Have you tried to take check
>> it more times?

Looking at information.txt, I found something interesting

rt_rq[0]:/1314
  .rt_nr_running                 : 1
  .rt_throttled                  : 1
  .rt_time                       : 0.856656
  .rt_runtime                    : 0.000000


cfs_rq[0]:/1314
  .exec_clock                    : 8738.133429
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 8739.371271
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -9792.255554
  .nr_spread_over                : 1
  .nr_running                    : 0
  .load                          : 0
  .load_avg                      : 7376.722880
  .load_period                   : 7.203830
  .load_contrib                  : 1023
  .load_tg                       : 1023
  .se->exec_start                : 282004.715064
  .se->vruntime                  : 18435.664560
  .se->sum_exec_runtime          : 8738.133429
  .se->wait_start                : 0.000000
  .se->sleep_start               : 0.000000
  .se->block_start               : 0.000000
  .se->sleep_max                 : 0.000000
  .se->block_max                 : 0.000000
  .se->exec_max                  : 77.977054
  .se->slice_max                 : 0.000000
  .se->wait_max                  : 2.664779
  .se->wait_sum                  : 29.970575
  .se->wait_count                : 102
  .se->load.weight               : 2

So 1314 is a real time process and

cpu.rt_period_us:
1000000
----------------------
cpu.rt_runtime_us:
0

When did tt move to being a Real Time process (hint: see nr_running
and nr_throttled)?

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/