linux-kernel - Re: [RFC] 3.10 kernel- oom with about 24G free memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d01fea5-66d6-b6ac-918d-19ec8a15dbaf@huawei.com>
Date:   Fri, 10 Feb 2017 16:48:58 +0800
From:   Yisheng Xie <xieyisheng1@...wei.com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Vlastimil Babka <vbabka@...e.cz>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Hanjun Guo <guohanjun@...wei.com>
Subject: Re: [RFC] 3.10 kernel- oom with about 24G free memory

Hi Michal,

Thanks for comment!
On 2017/2/10 15:09, Michal Hocko wrote:
> On Fri 10-02-17 09:13:58, Yisheng Xie wrote:
>> hi Michal,
>> Thanks for your comment.
>>
>> On 2017/2/9 21:41, Michal Hocko wrote:
>>> On Thu 09-02-17 14:26:28, Michal Hocko wrote:
>>>> On Thu 09-02-17 20:54:49, Yisheng Xie wrote:
>>>>> Hi all,
>>>>> I get an oom on a linux 3.10 kvm guest OS. when it triggers the oom
>>>>> it have about 24G free memory(and host OS have about 10G free memory)
>>>>> and watermark is sure ok.
>>>>>
>>>>> I also check about about memcg limit value, also cannot find the
>>>>> root cause.
>>>>>
>>>>> Is there anybody ever meet similar problem and have any idea about it?
>>>>>
>>>>> Any comment is more than welcome!
>>>>>
>>>>> Thanks
>>>>> Yisheng Xie
>>>>>
>>>>> -------------
>>>>> [   81.234289] DefSch0200 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
>>>>> [   81.234295] DefSch0200 cpuset=/ mems_allowed=0
>>>>> [   81.234299] CPU: 3 PID: 8284 Comm: DefSch0200 Tainted: G           O E ----V-------   3.10.0-229.42.1.105.x86_64 #1
>>>>> [   81.234301] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.8.1-0-g4adadbd-20161111_105425-HGH1000008200 04/01/2014
>>>>> [   81.234303]  ffff880ae2900000 000000002b3489d7 ffff880b6cec7c58 ffffffff81608d3d
>>>>> [   81.234307]  ffff880b6cec7ce8 ffffffff81603d1c 0000000000000000 ffff880b6cd09000
>>>>> [   81.234311]  ffff880b6cec7cd8 000000002b3489d7 ffff880b6cec7ce0 ffffffff811bdd77
>>>>> [   81.234314] Call Trace:
>>>>> [   81.234323]  [<ffffffff81608d3d>] dump_stack+0x19/0x1b
>>>>> [   81.234327]  [<ffffffff81603d1c>] dump_header+0x8e/0x214
>>>>> [   81.234333]  [<ffffffff811bdd77>] ? mem_cgroup_iter+0x177/0x2b0
>>>>> [   81.234339]  [<ffffffff8115d83e>] check_panic_on_oom+0x2e/0x60
>>>>> [   81.234342]  [<ffffffff811c17bf>] mem_cgroup_oom_synchronize+0x34f/0x580
>>>>
>>>> OK, so this is a memcg OOM killer which panics because the configuration
>>>> says so. The OOM report doesn't say so and that is the bug. dump_header
>>>> is memcg aware and mem_cgroup_out_of_memory initializes oom_control
>>>> properly. Is this Vanilla kernel?
>>
>> That means we should raise the limit of that memcg to avoid memcg OOM killer, right?
> 
> Why do you configure the system to panic on memcg OOM in the first
> place. This is a wrong thing to do in 99% of cases.
For our production think it should use reboot to recovery the system when OOM,
instead of killing user's key process. Maybe not the right thing.

Thanks
Yisheng Xie