[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DE3D5A8.8060707@simplicitymedialtd.co.uk>
Date: Mon, 30 May 2011 18:36:40 +0100
From: "Cal Leeming [Simplicity Media Ltd]"
<cal.leeming@...plicitymedialtd.co.uk>
To: linux-kernel@...r.kernel.org,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"kosaki.motohiro@...fujitsu.com" <kosaki.motohiro@...fujitsu.com>,
"minchan.kim@...il.com" <minchan.kim@...il.com>,
"rientjes@...gle.com" <rientjes@...gle.com>,
kamezawa.hiroyu@...fujitsu.com
Subject: Re: cgroup OOM killer loop causes system to lockup (possible fix
included)
Here is an strace of the SSH process (which is somehow inheriting the
-17 oom_adj on all forked user instances)
(broken server - with bnx2 module loaded)
[pid 2200] [ 7f13a09c9cb0] open("/proc/self/oom_adj",
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid 2120] [ 7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort
22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
[pid 2200] [ 7f13a09c9cb0] <... open resumed> ) = 9
[pid 2120] [ 7f13a09c9f00] <... write resumed> ) = 680
[pid 2120] [ 7f13a09c9e40] close(7 <unfinished ...>
[pid 2200] [ 7f13a09c9844] fstat(9, <unfinished ...>
[pid 2120] [ 7f13a09c9e40] <... close resumed> ) = 0
[pid 2200] [ 7f13a09c9844] <... fstat resumed>
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 2120] [ 7f13a09c9e40] close(8 <unfinished ...>
[pid 2200] [ 7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 2120] [ 7f13a09c9e40] <... close resumed> ) = 0
[pid 2200] [ 7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
[pid 2120] [ 7f13a09c9e40] close(4 <unfinished ...>
[pid 2200] [ 7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
(working server - with bnx2 module unloaded)
[pid 1323] [ 7fae577fbe40] close(7) = 0
[pid 1631] [ 7fae577fbcb0] open("/proc/self/oom_adj",
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid 1323] [ 7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished ...>
[pid 1631] [ 7fae577fbcb0] <... open resumed> ) = 10
[pid 1323] [ 7fae577fbf00] <... write resumed> ) = 5
[pid 1323] [ 7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort
22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
[pid 1631] [ 7fae577fb844] fstat(10, <unfinished ...>
[pid 1323] [ 7fae577fbf00] <... write resumed> ) = 654
[pid 1631] [ 7fae577fb844] <... fstat resumed>
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 1323] [ 7fae577fbe40] close(8) = 0
[pid 1631] [ 7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 1323] [ 7fae577fbe40] close(9 <unfinished ...>
[pid 1631] [ 7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
[pid 1323] [ 7fae577fbe40] <... close resumed> ) = 0
[pid 1323] [ 7fae577fbe40] close(5 <unfinished ...>
[pid 1631] [ 7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
The two servers are *EXACT* duplicates of each other, completely fresh
Debian installs, with exactly the same packages installed.
As you can see, the working server sends "0" into the oom_adj and the
broken one sends "-17".
On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
> I FOUND THE PROBLEM!!!
>
> Explicit details can be found on the Debian kernel mailing list, but
> to cut short, it's caused by the firmware-bnx2 kernel module:
>
> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package,
> removed the bnx*.ko files from /lib/modules, ran update-initramfs, and
> then rebooted (i then confirmed it was removed by checking ifconfig
> and lsmod).
>
> And guess what.. IT WORKED.
>
> So, this problem seems to be caused by the firmware-bnx2 module being
> loaded.. some how, that module is causing -17 oom_adj to be set for
> everything..
>
> WTF?!?! Surely a bug?? Could someone please forward this to the
> appropriate person for the bnx2 kernel module, as I wouldn't even know
> where to begin :S
>
> Cal
>
> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>> -resent due to incorrect formatting, sorry if this dups!
>>
>> @Kame
>> Thanks for the reply!
>> Both kernels used the same env/dist, but which slightly different
>> packages.
>> After many frustrating hours, I have pin pointed this down to a dodgy
>> Debian package which appears to continue affecting the system, even
>> after purging. I'm still yet to pin point the package down (I'm doing
>> several reinstall tests, along with tripwire analysis after each
>> reboot).
>>
>> @Hiroyuki
>> Thank you for sending this to the right people!
>>
>> @linux-mm
>> On a side note, would someone mind taking a few minutes to give a
>> brief explanation as to how the default oom_adj is set, and under what
>> conditions it is given -17 by default? Is this defined by the
>> application? I looked through the kernel source,
>> and noticed some of the code was defaulted to set oom_adj to
>> OOM_DISABLE (which is defined in the headers as -17).
>>
>> Assuming the debian problem is resolved, this might be another call
>> for the oom-killer to be modified so that if it encounters the
>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>> such as kernel processes, and other critical things). If this is going
>> to be a relatively simple task, I wouldn't mind spending a few hours
>> patching this?
>>
>> Cal
>>
>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@...fujitsu.com> wrote:
>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>
>>> On Sun, 29 May 2011 23:24:07 +0100
>>> "Cal Leeming [Simplicity Media
>>> Ltd]"<cal.leeming@...plicitymedialtd.co.uk> wrote:
>>>
>>>> Some further logs:
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>
>>>>
>>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>>> AFAIK, no kernel has such crazy settings as default..
>>>
>>> Does your 2 kernel uses the same environment/distribution ?
>>>
>>> Thanks,
>>> -Kame
>>>
>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>> First of all, my apologies if I have submitted this problem to the
>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>>> be sent, and was still none the wiser.
>>>>>
>>>>> The problem is related to applying memory limitations within a
>>>>> cgroup.
>>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>> loop, which in turn locks up the system.
>>>>>
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>
>>>>>
>>>>> root@...ky [/home/foxx]> uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011
>>>>> x86_64
>>>>> GNU/Linux
>>>>> (this happens on both the grsec patched and non patched 2.6.32.41
>>>>> kernel)
>>>>>
>>>>> When this is encountered, the memory usage across the whole server is
>>>>> still within limits (not even hitting swap).
>>>>>
>>>>> The memory configuration for the cgroup/lxc is:
>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>
>>>>> Now, what is even more strange, is that when running under the
>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>> happen. However, there is a slight difference between the two
>>>>> kernels.
>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>>> root cause of why it's showing in the later kernel, but not the
>>>>> earlier.
>>>>>
>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>> ran a one liner which showed me all the processes with an oom_adj
>>>>> of -17:
>>>>>
>>>>> (the below is the older/working kernel)
>>>>> root@...rtney.internal [/mnt/encstore/lxc]> uname -a
>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07
>>>>> GMT
>>>>> 2011 x86_64 GNU/Linux
>>>>> root@...rtney.internal [/mnt/encstore/lxc]> for x in `find /proc
>>>>> -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do
>>>>> ps -p $x --no-headers ; done
>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>> 804 ? 00:00:00 udevd
>>>>> 804 ? 00:00:00 udevd
>>>>> 25536 ? 00:00:00 sshd
>>>>> 25536 ? 00:00:00 sshd
>>>>> 31861 ? 00:00:00 sshd
>>>>> 31861 ? 00:00:00 sshd
>>>>> 32173 ? 00:00:00 udevd
>>>>> 32173 ? 00:00:00 udevd
>>>>> 32174 ? 00:00:00 udevd
>>>>> 32174 ? 00:00:00 udevd
>>>>>
>>>>> (the below is the newer/broken kernel)
>>>>> root@...ky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011
>>>>> x86_64
>>>>> GNU/Linux
>>>>> root@...ky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> for x in
>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/'
>>>>> '{print
>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>> 895 ? 00:00:00 udevd
>>>>> 895 ? 00:00:00 udevd
>>>>> 1091 ? 00:00:00 udevd
>>>>> 1091 ? 00:00:00 udevd
>>>>> 1092 ? 00:00:00 udevd
>>>>> 1092 ? 00:00:00 udevd
>>>>> 2596 ? 00:00:00 sshd
>>>>> 2596 ? 00:00:00 sshd
>>>>> 2608 ? 00:00:00 sshd
>>>>> 2608 ? 00:00:00 sshd
>>>>> 2613 ? 00:00:00 sshd
>>>>> 2613 ? 00:00:00 sshd
>>>>> 2614 pts/0 00:00:00 bash
>>>>> 2614 pts/0 00:00:00 bash
>>>>> 2620 pts/0 00:00:00 sudo
>>>>> 2620 pts/0 00:00:00 sudo
>>>>> 2621 pts/0 00:00:00 su
>>>>> 2621 pts/0 00:00:00 su
>>>>> 2622 pts/0 00:00:00 bash
>>>>> 2622 pts/0 00:00:00 bash
>>>>> 2685 ? 00:00:00 lxc-start
>>>>> 2685 ? 00:00:00 lxc-start
>>>>> 2699 ? 00:00:00 init
>>>>> 2699 ? 00:00:00 init
>>>>> 2939 ? 00:00:00 rc
>>>>> 2939 ? 00:00:00 rc
>>>>> 2942 ? 00:00:00 startpar
>>>>> 2942 ? 00:00:00 startpar
>>>>> 2964 ? 00:00:00 rsyslogd
>>>>> 2964 ? 00:00:00 rsyslogd
>>>>> 2964 ? 00:00:00 rsyslogd
>>>>> 2964 ? 00:00:00 rsyslogd
>>>>> 2980 ? 00:00:00 startpar
>>>>> 2980 ? 00:00:00 startpar
>>>>> 2981 ? 00:00:00 ctlscript.sh
>>>>> 2981 ? 00:00:00 ctlscript.sh
>>>>> 3016 ? 00:00:00 cron
>>>>> 3016 ? 00:00:00 cron
>>>>> 3025 ? 00:00:00 mysqld_safe
>>>>> 3025 ? 00:00:00 mysqld_safe
>>>>> 3032 ? 00:00:00 sshd
>>>>> 3032 ? 00:00:00 sshd
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3097 ? 00:00:00 mysqld.bin
>>>>> 3113 ? 00:00:00 ctl.sh
>>>>> 3113 ? 00:00:00 ctl.sh
>>>>> 3115 ? 00:00:00 sleep
>>>>> 3115 ? 00:00:00 sleep
>>>>> 3116 ? 00:00:00 .memcached.bin
>>>>> 3116 ? 00:00:00 .memcached.bin
>>>>>
>>>>>
>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>> default, which in turn is causing the OOM killer loop.
>>>>>
>>>>> So I began to try and find what may have caused this problem by
>>>>> comparing the two sources...
>>>>>
>>>>> I checked the code for all references to 'oom_adj' and
>>>>> 'oom_adjust' in
>>>>> both code sets, but found no obvious differences:
>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>>> oom_adj
>>>>>
>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>> found a couple of matches, but only one obvious one:
>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>
>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>
>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>> root@...abelle [~/lol/linux-2.6.32.28]> grep -i -R "oom" . |
>>>>> sort -n
>>>>>> /tmp/annabelle.oom_adj
>>>>> root@...ky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> grep -i -R
>>>>> "oom" . | sort -n> /tmp/vicky.oom_adj
>>>>>
>>>>> and this brought back (yet again) nothing obvious..
>>>>>
>>>>>
>>>>> root@...ky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>>>>>
>>>>> root@...abelle [~/lol/linux-2.6.32.28]> md5sum
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>>>>>
>>>>> root@...ky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
>>>>> ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>>>>>
>>>>> root@...abelle [~/lol/linux-2.6.32.28]> md5sum ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>>>>>
>>>>>
>>>>>
>>>>> Could anyone please shed some light as to why the default oom_adj is
>>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>>> the fix for this issue will either be:
>>>>>
>>>>> 1. Allow OOM killer to override the decision of ignoring
>>>>> oom_adj ==
>>>>> -17 if an unrecoverable loop is encountered.
>>>>> 2. Change the default back to 0.
>>>>>
>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>>> absolute best to give all the necessary information though.
>>>>>
>>>>> Cal
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-kernel" in
>>>> the body of a message to majordomo@...r.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists