[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171006093702.3ca2p6ymyycwfgbk@dhcp22.suse.cz>
Date: Fri, 6 Oct 2017 11:37:02 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Yang Shi <yang.s@...baba-inc.com>
Cc: cl@...ux.com, penberg@...nel.org, rientjes@...gle.com,
iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when
unreclaimable slabs > user memory
On Thu 05-10-17 05:29:10, Yang Shi wrote:
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfunction sometime.
> And, since kernel already panic it is worthy capturing such information
> in dmesg to aid touble shooting.
>
> Print out unreclaimable slab info (used size and total size) which
> actual memory usage is not zero (num_objs * size != 0) when
> unreclaimable slabs amount is greater than total user memory (LRU
> pages).
>
> The output looks like:
>
> Unreclaimable slab info:
> Name Used Total
> rpc_buffers 31KB 31KB
> rpc_tasks 7KB 7KB
> ebitmap_node 1964KB 1964KB
> avtab_node 5024KB 5024KB
> xfs_buf 1402KB 1402KB
> xfs_ili 134KB 134KB
> xfs_efi_item 115KB 115KB
> xfs_efd_item 115KB 115KB
> xfs_buf_item 134KB 134KB
> xfs_log_item_desc 342KB 342KB
> xfs_trans 1412KB 1412KB
> xfs_ifork 212KB 212KB
OK this looks better. The naming is not the greatest but I will not
nitpick on this. I have one question though
>
> Signed-off-by: Yang Shi <yang.s@...baba-inc.com>
[...]
> +void dump_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s, *s2;
> + struct slabinfo sinfo;
> +
> + /*
> + * Here acquiring slab_mutex is risky since we don't prefer to get
> + * sleep in oom path. But, without mutex hold, it may introduce a
> + * risk of crash.
> + * Use mutex_trylock to protect the list traverse, dump nothing
> + * without acquiring the mutex.
> + */
> + if (!mutex_trylock(&slab_mutex)) {
> + pr_warn("excessive unreclaimable slab but cannot dump stats\n");
> + return;
> + }
> +
> + pr_info("Unreclaimable slab info:\n");
> + pr_info("Name Used Total\n");
> +
> + list_for_each_entry_safe(s, s2, &slab_caches, list) {
> + if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> + continue;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
why do you zero out the structure. All the fields you are printing are
filled out in get_slabinfo.
> + get_slabinfo(s, &sinfo);
> +
> + if (sinfo.num_objs > 0)
> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> + (sinfo.active_objs * s->size) / 1024,
> + (sinfo.num_objs * s->size) / 1024);
> + }
> + mutex_unlock(&slab_mutex);
> +}
> +
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
> {
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists