linux-kernel - RE: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <010401d0ff34$f48e8eb0$ddabac10$@samsung.com>
Date:	Mon, 05 Oct 2015 11:42:49 +0530
From:	PINTU KUMAR <pintu.k@...sung.com>
To:	'Michal Hocko' <mhocko@...nel.org>
Cc:	akpm@...ux-foundation.org, minchan@...nel.org, dave@...olabs.net,
	koct9i@...il.com, rientjes@...gle.com, hannes@...xchg.org,
	penguin-kernel@...ove.sakura.ne.jp, bywxiaobai@....com,
	mgorman@...e.de, vbabka@...e.cz, js1304@...il.com,
	kirill.shutemov@...ux.intel.com, alexander.h.duyck@...hat.com,
	sasha.levin@...cle.com, cl@...ux.com, fengguang.wu@...el.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, cpgs@...sung.com,
	pintu_agarwal@...oo.com, pintu.ping@...il.com,
	vishnu.ps@...sung.com, rohit.kr@...sung.com,
	c.rajkumar@...sung.com, sreenathd@...sung.com
Subject: RE: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter

Hi,

> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@...nel.org]
> Sent: Thursday, October 01, 2015 7:09 PM
> To: Pintu Kumar
> Cc: akpm@...ux-foundation.org; minchan@...nel.org; dave@...olabs.net;
> koct9i@...il.com; rientjes@...gle.com; hannes@...xchg.org; penguin-
> kernel@...ove.sakura.ne.jp; bywxiaobai@....com; mgorman@...e.de;
> vbabka@...e.cz; js1304@...il.com; kirill.shutemov@...ux.intel.com;
> alexander.h.duyck@...hat.com; sasha.levin@...cle.com; cl@...ux.com;
> fengguang.wu@...el.com; linux-kernel@...r.kernel.org; linux-mm@...ck.org;
> cpgs@...sung.com; pintu_agarwal@...oo.com; pintu.ping@...il.com;
> vishnu.ps@...sung.com; rohit.kr@...sung.com; c.rajkumar@...sung.com;
> sreenathd@...sung.com
> Subject: Re: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
> 
> On Thu 01-10-15 16:18:43, Pintu Kumar wrote:
> > This patch maintains number of oom calls and number of oom kill count
> > in /proc/vmstat.
> > It is helpful during sluggish, aging or long duration tests.
> > Currently if the OOM happens, it can be only seen in kernel ring buffer.
> > But during long duration tests, all the dmesg and /var/log/messages*
> > could be overwritten.
> > So, just like other counters, the oom can also be maintained in
> > /proc/vmstat.
> > It can be also seen if all logs are disabled in kernel.
> >
> > A snapshot of the result of over night test is shown below:
> > $ cat /proc/vmstat
> > oom_stall 610
> > oom_kill_count 1763
> >
> > Here, oom_stall indicates that there are 610 times, kernel entered
> > into OOM cases. However, there were around 1763 oom killing happens.
> 
> This alone looks quite suspicious. Unless you have tasks which share the
address
> space without being in the same thread group this shouldn't happen in such a
> large scale.

Yes, this accounts for out_of_memory even from memory cgroups.
Please check few snapshots of dmesg outputs captured during over-night tests.
........
[49479.078033]  [2:      xxxxxxxx:20874] Memory cgroup out of memory: Kill
process 20880 (xxxxxxx) score 112 or sacrifice child
[49480.910430]  [2:      xxxxxxxx:20882] Memory cgroup out of memory: Kill
process 20888 (xxxxxxxx) score 112 or sacrifice child
[49567.046203]  [0:        yyyyyyy:  548] Out of memory: Kill process 20458
(zzzzzzzzzz) score 102 or sacrifice child
[49567.346588]  [0:        yyyyyyy:  548] Out of memory: Kill process 21102
(zzzzzzzzzz) score 104 or sacrifice child
.........
The _out of memory_ count in dmesg dump output exactly matches the number in
/proc/vmstat -> oom_kill_count

> </me looks into the patch>
> And indeed the patch is incorrect. You are only counting OOMs from the page
> allocator slow path. You are missing all the OOM invocations from the page
fault
> path.

Sorry, I am not sure what exactly you mean. Please point me out if I am missing
some places.
Actually, I tried to add it at generic place that is; oom_kill_process, which is
called by out_of_memory(...).
Are you talking about: pagefault_out_of_memory(...) ?
But, this is already calling: out_of_memory. No?

> The placement inside __alloc_pages_may_oom looks quite arbitrary as well. You
> are not counting events where we are OOM but somebody is holding the
> oom_mutex but you do count last attempt before going really OOM. Then we
> have cases which do not invoke OOM killer which are counted into oom_stall as
> well. I am not sure whether they should because I am not quite sure about the
> semantic of the counter in the first place.

Ok. Yes, it can be added right after it enters into __alloc_pages_may_oom.
I will make the changes.
Actually, I knowingly skipped the oom_lock case, because in our 3.10 kernel, we
had note_oom_kill(..) 
Added right after this check.
So, I also added it exactly at the same place.
Ok, I can make the necessary changes, if the oom_lock case also matters. 

> What is it supposed to tell us? How many times the system had to go into
> emergency OOM steps? How many times the direct reclaim didn't make any
> progress so we can consider the system OOM?
> 
Yes, exactly, oom_stall can tell, how many times OOM is invoked in the system.
Yes, it can also tell how many times direct_reclaim fails completely.
Currently, we don't have any counter for direct_reclaim success/fail.
Also, oom_kill_process will not be invoked for higher orders
(PAGE_ALLOC_COSTLY_ORDER).
But, it will enter OOM and results into straight page allocation failure.

> oom_kill_count has a slightly misleading names because it suggests how many
> times oom_kill was called but in fact it counts the oom victims.
> Not sure whether this information is so much useful but the semantic is clear
at
> least.
> 
Ok, agree about the semantic of the name: oom_kill_count.
If possible please suggest a better name.
How about the following names?
oom_victim_count ?
oom_nr_killed ?
oom_nr_victim ?

> > The OOM is bad for the any system. So, this counter can help the
> > developer in tuning the memory requirement at least during initial bringup.
> >
> > Signed-off-by: Pintu Kumar <pintu.k@...sung.com>
> > ---
> >  include/linux/vm_event_item.h |    2 ++
> >  mm/oom_kill.c                 |    2 ++
> >  mm/page_alloc.c               |    2 +-
> >  mm/vmstat.c                   |    2 ++
> >  4 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/vm_event_item.h
> > b/include/linux/vm_event_item.h index 2b1cef8..ade0851 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -57,6 +57,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,
> > PSWPOUT,  #ifdef CONFIG_HUGETLB_PAGE
> >  		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,  #endif
> > +		OOM_STALL,
> > +		OOM_KILL_COUNT,
> >  		UNEVICTABLE_PGCULLED,	/* culled to noreclaim list */
> >  		UNEVICTABLE_PGSCANNED,	/* scanned for reclaimability */
> >  		UNEVICTABLE_PGRESCUED,	/* rescued from noreclaim list */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..e79caed
> > 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> >  	 * space under its control.
> >  	 */
> >  	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> > +	count_vm_event(OOM_KILL_COUNT);
> >  	mark_oom_victim(victim);
> >  	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-
> rss:%lukB\n",
> >  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> @@
> > -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> >  				task_pid_nr(p), p->comm);
> >  			task_unlock(p);
> >  			do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> > +			count_vm_event(OOM_KILL_COUNT);
> >  		}
> >  	rcu_read_unlock();
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..1d82210
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2761,7 +2761,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned
> int order,
> >  		schedule_timeout_uninterruptible(1);
> >  		return NULL;
> >  	}
> > -
> > +	count_vm_event(OOM_STALL);
> >  	/*
> >  	 * Go through the zonelist yet one more time, keep very high watermark
> >  	 * here, this is only to catch a parallel oom killing, we must fail
> > if diff --git a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..f054265
> > 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -808,6 +808,8 @@ const char * const vmstat_text[] = {
> >  	"htlb_buddy_alloc_success",
> >  	"htlb_buddy_alloc_fail",
> >  #endif
> > +	"oom_stall",
> > +	"oom_kill_count",
> >  	"unevictable_pgs_culled",
> >  	"unevictable_pgs_scanned",
> >  	"unevictable_pgs_rescued",
> > --
> > 1.7.9.5
> 
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/