linux-kernel - Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 25 Jan 2013 17:31:30 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	azurIt <azurit@...ox.sk>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cgroups mailinglist <cgroups@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
 add_to_page_cache_locked

On Fri 25-01-13 16:07:23, azurIt wrote:
> Any news? Thnx!

Sorry, but I didn't get to this one yet.

> 
> azur
> 
> 
> 
> ______________________________________________________________
> > Od: "Michal Hocko" <mhocko@...e.cz>
> > Komu: azurIt <azurit@...ox.sk>
> > Dátum: 30.12.2012 12:08
> > Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked
> >
> > CC: linux-kernel@...r.kernel.org, linux-mm@...ck.org, "cgroups mailinglist" <cgroups@...r.kernel.org>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>, "Johannes Weiner" <hannes@...xchg.org>
> >On Sun 30-12-12 02:09:47, azurIt wrote:
> >> >which suggests that the patch is incomplete and that I am blind :/
> >> >mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
> >> >and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
> >> >follow-up patch on top of the one you already have (which should catch
> >> >all the remaining cases).
> >> >Sorry about that...
> >> 
> >> 
> >> This was, again, killing my MySQL server (search for "(mysqld)"):
> >> http://www.watchdog.sk/lkml/oom_mysqld5
> >
> >grep "Kill process" oom_mysqld5 
> >Dec 30 01:53:34 server01 kernel: [  367.061801] Memory cgroup out of memory: Kill process 5512 (apache2) score 716 or sacrifice child
> >Dec 30 01:53:35 server01 kernel: [  367.338024] Memory cgroup out of memory: Kill process 5517 (apache2) score 718 or sacrifice child
> >Dec 30 01:53:35 server01 kernel: [  367.747888] Memory cgroup out of memory: Kill process 5513 (apache2) score 721 or sacrifice child
> >Dec 30 01:53:36 server01 kernel: [  368.159860] Memory cgroup out of memory: Kill process 5516 (apache2) score 726 or sacrifice child
> >Dec 30 01:53:36 server01 kernel: [  368.665606] Memory cgroup out of memory: Kill process 5520 (apache2) score 733 or sacrifice child
> >Dec 30 01:53:36 server01 kernel: [  368.765652] Out of memory: Kill process 1778 (mysqld) score 39 or sacrifice child
> >Dec 30 01:53:36 server01 kernel: [  369.101753] Memory cgroup out of memory: Kill process 5519 (apache2) score 754 or sacrifice child
> >Dec 30 01:53:37 server01 kernel: [  369.464262] Memory cgroup out of memory: Kill process 5583 (apache2) score 762 or sacrifice child
> >Dec 30 01:53:37 server01 kernel: [  369.465017] Out of memory: Kill process 5506 (apache2) score 18 or sacrifice child
> >Dec 30 01:53:37 server01 kernel: [  369.574932] Memory cgroup out of memory: Kill process 5523 (apache2) score 759 or sacrifice child
> >
> >So your mysqld has been killed by the global OOM not memcg. But why when
> >you seem to be perfectly fine regarding memory? I guess the following
> >backtrace is relevant:
> >Dec 30 01:53:36 server01 kernel: [  368.569720] DMA: 0*4kB 1*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15912kB
> >Dec 30 01:53:36 server01 kernel: [  368.570447] DMA32: 9*4kB 10*8kB 8*16kB 6*32kB 5*64kB 6*128kB 4*256kB 2*512kB 3*1024kB 3*2048kB 613*4096kB = 2523636kB
> >Dec 30 01:53:36 server01 kernel: [  368.571175] Normal: 5*4kB 2060*8kB 4122*16kB 2550*32kB 2667*64kB 722*128kB 197*256kB 68*512kB 15*1024kB 4*2048kB 1855*4096kB = 8134036kB
> >Dec 30 01:53:36 server01 kernel: [  368.571906] 308964 total pagecache pages
> >Dec 30 01:53:36 server01 kernel: [  368.572023] 0 pages in swap cache
> >Dec 30 01:53:36 server01 kernel: [  368.572140] Swap cache stats: add 0, delete 0, find 0/0
> >Dec 30 01:53:36 server01 kernel: [  368.572260] Free swap  = 0kB
> >Dec 30 01:53:36 server01 kernel: [  368.572375] Total swap = 0kB
> >Dec 30 01:53:36 server01 kernel: [  368.597836] apache2 invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0, oom_score_adj=0
> >Dec 30 01:53:36 server01 kernel: [  368.598034] apache2 cpuset=uid mems_allowed=0
> >Dec 30 01:53:36 server01 kernel: [  368.598152] Pid: 5385, comm: apache2 Not tainted 3.2.35-grsec #1
> >Dec 30 01:53:36 server01 kernel: [  368.598273] Call Trace:
> >Dec 30 01:53:36 server01 kernel: [  368.598396]  [<ffffffff810cc89e>] dump_header+0x7e/0x1e0
> >Dec 30 01:53:36 server01 kernel: [  368.598516]  [<ffffffff810cc79f>] ? find_lock_task_mm+0x2f/0x70
> >Dec 30 01:53:36 server01 kernel: [  368.598638]  [<ffffffff810ccd65>] oom_kill_process+0x85/0x2a0
> >Dec 30 01:53:36 server01 kernel: [  368.598759]  [<ffffffff810cd415>] out_of_memory+0xe5/0x200
> >Dec 30 01:53:36 server01 kernel: [  368.598880]  [<ffffffff810cd5ed>] pagefault_out_of_memory+0xbd/0x110
> >Dec 30 01:53:36 server01 kernel: [  368.599006]  [<ffffffff81026e96>] mm_fault_error+0xb6/0x1a0
> >Dec 30 01:53:36 server01 kernel: [  368.599127]  [<ffffffff8102736e>] do_page_fault+0x3ee/0x460
> >Dec 30 01:53:36 server01 kernel: [  368.599250]  [<ffffffff81131ccf>] ? mntput+0x1f/0x30
> >Dec 30 01:53:36 server01 kernel: [  368.599371]  [<ffffffff811134e6>] ? fput+0x156/0x200
> >Dec 30 01:53:36 server01 kernel: [  368.599496]  [<ffffffff815b567f>] page_fault+0x1f/0x30
> >
> >This would suggest that an unexpected ENOMEM leaked during page fault
> >path. I do not see which one could that be because you said THP
> >(CONFIG_TRANSPARENT_HUGEPAGE) are disabled (and the other patch I have
> >mentioned in the thread should fix that issue - btw. the patch is
> >already scheduled for stable tree).
> > __do_fault, do_anonymous_page and do_wp_page call
> >mem_cgroup_newpage_charge with GFP_KERNEL which means that
> >we do memcg OOM and never return ENOMEM. do_swap_page calls
> >mem_cgroup_try_charge_swapin with GFP_KERNEL as well.
> >
> >I might have missed something but I will not get to look closer before
> >2nd January.
> >-- 
> >Michal Hocko
> >SUSE Labs
> >
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/