linux-kernel - Re: [patch 0/7] improve memcg oom killer robustness v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20130917131535.94E0A843@pobox.sk>
Date:	Tue, 17 Sep 2013 13:15:35 +0200
From:	"azurIt" <azurit@...ox.sk>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Michal Hocko <mhocko@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	<linux-mm@...ck.org>, <cgroups@...r.kernel.org>, <x86@...nel.org>,
	<linux-arch@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2

______________________________________________________________
> Od: Johannes Weiner <hannes@...xchg.org>
> Komu: azurIt <azurit@...ox.sk>
> Dátum: 17.09.2013 02:02
> Predmet: Re: [patch 0/7] improve memcg oom killer robustness v2
>
> CC: "Michal Hocko" <mhocko@...e.cz>, "Andrew Morton" <akpm@...ux-foundation.org>, "David Rientjes" <rientjes@...gle.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, x86@...nel.org, linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
>On Mon, Sep 16, 2013 at 10:52:46PM +0200, azurIt wrote:
>> > CC: "Johannes Weiner" <hannes@...xchg.org>, "Andrew Morton" <akpm@...ux-foundation.org>, "David Rientjes" <rientjes@...gle.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, x86@...nel.org, linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
>> >On Mon 16-09-13 17:05:43, azurIt wrote:
>> >> > CC: "Johannes Weiner" <hannes@...xchg.org>, "Andrew Morton" <akpm@...ux-foundation.org>, "David Rientjes" <rientjes@...gle.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, x86@...nel.org, linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
>> >> >On Mon 16-09-13 16:13:16, azurIt wrote:
>> >> >[...]
>> >> >> >You can use sysrq+l via serial console to see tasks hogging the CPU or
>> >> >> >sysrq+t to see all the existing tasks.
>> >> >> 
>> >> >> 
>> >> >> Doesn't work here, it just prints 'l' resp. 't'.
>> >> >
>> >> >I am using telnet for accessing my serial consoles exported by
>> >> >the multiplicator or KVM and it can send sysrq via ctrl+t (Send
>> >> >Break). Check your serial console setup.
>> >> 
>> >> 
>> >> 
>> >> I'm using Raritan KVM and i created keyboard macro 'sysrq + l' resp.
>> >> 'sysrq + t'. I'm also unable to use it on my local PC. Maybe it needs
>> >> to be enabled somehow?
>> >
>> >Probably yes. echo 1 > /proc/sys/kernel/sysrq should enable all sysrq
>> >commands. You can select also some of them (have a look at
>> >Documentation/sysrq.txt for more information)
>> 
>> 
>> Now it happens again and i was just looking on the server's
>> htop. I'm sure that this time it was only one process (apache)
>> running under user account (not root). It was taking about 100% CPU
>> (about 100% of one core). I was able to kill it by hand inside htop
>> but everything was very slow, server load was immediately on
>> 500. I'm sure it must be related to that Johannes kernel patches
>> because i'm also using i/o throttling in cgroups via Block IO
>> controller so users are unable to create such a huge I/O. I will try
>> to take stacks of processes but i'm not able to identify the
>> problematic process so i will have to take them from *all* apache
>> processes while killing them.
>
>It would be fantastic if you could capture those stacks.  sysrq+t
>captures ALL of them in one go and drops them into your syslog.
>
>/proc/<pid>/stack for individual tasks works too.


Is something unusual on this stack?


[<ffffffff810d1a5e>] dump_header+0x7e/0x1e0
[<ffffffff810d195f>] ? find_lock_task_mm+0x2f/0x70
[<ffffffff810d1f25>] oom_kill_process+0x85/0x2a0
[<ffffffff810d24a8>] mem_cgroup_out_of_memory+0xa8/0xf0
[<ffffffff8110fb76>] mem_cgroup_oom_synchronize+0x2e6/0x310
[<ffffffff8110efc0>] ? mem_cgroup_uncharge_page+0x40/0x40
[<ffffffff810d2703>] pagefault_out_of_memory+0x13/0x130
[<ffffffff81026f6e>] mm_fault_error+0x9e/0x150
[<ffffffff81027424>] do_page_fault+0x404/0x490
[<ffffffff810f952c>] ? do_mmap_pgoff+0x3dc/0x430
[<ffffffff815cb87f>] page_fault+0x1f/0x30


Problem happens again but my script was unable to get stacks. I was able to see processes which were doing problems (two this time) and i have their PIDs. The stack above is from different process but from the same cgroup (memcg OOM killed it and prints it's stack into syslog).

azur
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/