lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Oct 2015 13:58:05 +0800
From:	Pan Xinhui <xinhuix.pan@...el.com>
To:	linux-kernel@...r.kernel.org, linux-mm@...ck.org
CC:	Andrew Morton <akpm@...ux-foundation.org>, vbabka@...e.cz,
	rientjes@...gle.com, hannes@...xchg.org, mhocko@...e.cz,
	nasa4836@...il.com, mgorman@...e.de, alexander.h.duyck@...hat.com,
	aneesh.kumar@...ux.vnet.ibm.com,
	"yanmin_zhang@...ux.intel.com" <yanmin_zhang@...ux.intel.com>
Subject: Re: [PATCH] gfp: GFP_RECLAIM_MASK should include __GFP_NO_KSWAPD

Hi, all
	I am working on some debug features' development.
I use kmalloc in some places of *scheduler*. And the gfp_flag is GFP_ATOMIC, code looks like 
p = kmalloc(sizeof(*p), GFP_ATOMIC);

however I notice GFP_ATOMIC is still not enough. because when system is at low memory state, slub might try to wakeup kswapd. then some weird issues hit.

for example, this is one bug log.

[   83.713009, 0]BUG: spinlock recursion on CPU#0, rcu_preempt/7
[   83.719463, 0] lock: 0xffff88007ac12140, .magic: dead4ead, .owner: rcu_preempt/7, .owner_cpu: 0
[   83.729211, 0]CPU: 0 PID: 7 Comm: rcu_preempt Tainted: G        W    3.14.37-x86_64-gfa0f236-dirty #364
[   83.739733, 0]Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Cherry Trail FFD, BIOS BYO-P2.X64.0023.R03.1509161045 09/16/2015
[   83.753266, 0] ffff88007ac12140 ffff880074863880 ffffffff8198bca9 ffff8800749118b0
[   83.761871, 0] ffff8800748638a0 ffffffff819886d2 ffff88007ac12140 ffffffff81d89797
[   83.770451, 0] ffff8800748638c0 ffffffff819886fd ffff88007ac12140 ffff880070dc38a8
[   83.779066, 0]Call Trace:
[   83.782014, 0] [<ffffffff8198bca9>] dump_stack+0x4e/0x7a
[   83.787975, 0] [<ffffffff819886d2>] spin_dump+0x91/0x96
[   83.793839, 0] [<ffffffff819886fd>] spin_bug+0x26/0x2b
[   83.799606, 0] [<ffffffff810d3366>] do_raw_spin_lock+0x116/0x140
[   83.806344, 0] [<ffffffff81998d8f>] _raw_spin_lock+0x1f/0x30
[   83.812693, 0] [<ffffffff810bb884>] try_to_wake_up+0x154/0x2c0
[   83.819237, 0] [<ffffffff810bba62>] default_wake_function+0x12/0x20
[   83.826265, 0] [<ffffffff810cb1e8>] autoremove_wake_function+0x18/0x40
[   83.833585, 0] [<ffffffff810caaf8>] __wake_up_common+0x58/0x90
[   83.840128, 0] [<ffffffff810cad29>] __wake_up+0x39/0x50
[   83.845994, 0] [<ffffffff811659cd>] wakeup_kswapd+0xcd/0x140
[   83.852343, 0] [<ffffffff8115cbdd>] __alloc_pages_nodemask+0x95d/0xa30
[   83.859666, 0] [<ffffffff81195d2f>] new_slab+0x6f/0x2b0
[   83.865529, 0] [<ffffffff81989e5d>] __slab_alloc.constprop.64+0x26c/0x49f
[   83.873141, 0] [<ffffffff810af6c5>] ? insert_kill_task+0x25/0xa0
[   83.879880, 0] [<ffffffff81387f54>] ? __list_del_entry+0x14/0xf0
[   83.886618, 0] [<ffffffff811975f4>] kmem_cache_alloc_trace+0x174/0x1b0
[   83.893938, 0] [<ffffffff810af6c5>] ? insert_kill_task+0x25/0xa0
[   83.900677, 0] [<ffffffff810af6c5>] insert_kill_task+0x25/0xa0 //this function is simple, just treat it as kmalloc :)
[   83.907220, 0] [<ffffffff81994846>] __schedule+0x6a6/0x870
[   83.913375, 0] [<ffffffff81994a39>] schedule+0x29/0x70
[   83.919141, 0] [<ffffffff81993b02>] schedule_timeout+0x172/0x310
[   83.925879, 0] [<ffffffff81998e8e>] ? _raw_spin_unlock_irqrestore+0x1e/0x40
[   83.933685, 0] [<ffffffff810937c0>] ? __internal_add_timer+0x130/0x130
[   83.941005, 0] [<ffffffff810cb047>] ? prepare_to_wait_event+0x87/0xf0
[   83.948230, 0] [<ffffffff810e371a>] rcu_gp_kthread+0x40a/0x6e0
[   83.954774, 0] [<ffffffff810cb1d0>] ? abort_exclusive_wait+0xb0/0xb0
[   83.961899, 0] [<ffffffff810e3310>] ? rcu_try_advance_all_cbs+0xf0/0xf0
[   83.969315, 0] [<ffffffff810aa8a4>] kthread+0xe4/0x100
[   83.975081, 0] [<ffffffff810aa7c0>] ? kthread_create_on_node+0x190/0x190
[   83.982596, 0] [<ffffffff819a0f48>] ret_from_fork+0x58/0x90
[   83.988847, 0] [<ffffffff810aa7c0>] ? kthread_create_on_node+0x190/0x190

After some simple check, I change my codes. this time code looks like:
p = kmalloc(sizeof(*p), GFP_ATOMIC | __GFP_NO_KSWAPD);
I think this flag will forbid slub to call any scheduler codes. But issue still hit. :(

my test result shows that __GFP_NO_KSWAPD is cleared when slub pass gfp_flag to page allocator!!!

at last I found it is clear by codes below.
1441 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
1442 {
1443         if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
1444                 pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
1445                 BUG();
1446         }
1447 
1448         return allocate_slab(s,
1449                 flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);//all other flags will be cleared. my god!!!
1450 }

I think GFP_RECLAIM_MASK should include as many available flags as possible. :)

thanks
xinhui

On 2015年10月14日 13:36, Pan Xinhui wrote:
> From: Pan Xinhui <xinhuix.pan@...el.com>
> 
> GFP_RECLAIM_MASK was introduced in commit 6cb062296f73 ("Categorize GFP
> flags"). In slub subsystem, this macro controls slub's allocation
> behavior. In particular, some flags which are not in GFP_RECLAIM_MASK
> will be cleared. So when slub pass this new gfp_flag into page
> allocator, we might lost some very important flags.
> 
> There are some mistakes when we introduce __GFP_NO_KSWAPD. This flag is
> used to avoid any scheduler-related codes recursive.  But it seems like
> patch author forgot to add it into GFP_RECLAIM_MASK. So lets add it now.
> 
> Signed-off-by: Pan Xinhui <xinhuix.pan@...el.com>
> ---
>  include/linux/gfp.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index f92cbd2..9ebad4d 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -130,7 +130,8 @@ struct vm_area_struct;
>  /* Control page allocator reclaim behavior */
>  #define GFP_RECLAIM_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS|\
>  			__GFP_NOWARN|__GFP_REPEAT|__GFP_NOFAIL|\
> -			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC)
> +			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
> +			__GFP_NO_KSWAPD)
>  
>  /* Control slab gfp mask during early boot */
>  #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS))
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ