lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151201044445.GA3817@bbox>
Date:	Tue, 1 Dec 2015 13:44:45 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	"kyeongdon.kim" <kyeongdon.kim@....com>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On Tue, Dec 01, 2015 at 11:31:41AM +0900, kyeongdon.kim wrote:
> On 2015-12-01 오전 8:18, Minchan Kim wrote:
> > Hi Kyeongdon,
> > 
> > On Mon, Nov 30, 2015 at 07:42:02PM +0900, kyeongdon.kim wrote:
> > 
> >> > On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
> >> >> From: Kyeongdon Kim <kyeongdon.kim@....com>
> >> >>
> >> >> When we're using LZ4 multi compression streams for zram swap,
> >> >> we found out page allocation failure message in system running test.
> >> >> That was not only once, but a few(2 - 5 times per test).
> >> >> Also, some failure cases were continually occurring to try allocation
> >> >> order 3.
> >> >>
> >> >> In order to make parallel compression private data, we should call
> >> >> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> >> >> 2/3 size memory to allocate in that time, page allocation fails.
> >> >> This patch makes to use vmalloc() as fallback of kmalloc(), this
> >> >> prevents page alloc failure warning.
> >> >>
> >> >> After using this, we never found warning message in running test, also
> >> >> It could reduce process startup latency about 60-120ms in each case.
> >> >>
> >> >> For reference a call trace :
> >> >>
> >> >> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> >> >> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
> >> > #20
> >> >> Call trace:
> >> >> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> >> >> [<ffffffc000206c48>] show_stack+0x10/0x1c
> >> >> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> >> >> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> >> >> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> >> >> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> >> >> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> >> >> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> >> >> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> >> >> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> >> >> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> >> >> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> >> >> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> >> >> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> >> >> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> >> >> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> >> >> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> >> >> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> >> >> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> >> >> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> >> >> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> >> >> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> >> >> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> >> >> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> >> >> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
> >> >>
> >> >> [minchan: change vmalloc gfp and adding comment about gfp]
> >> >> [sergey: tweak comments and styles]
> >> >> Signed-off-by: Kyeongdon Kim <kyeongdon.kim@....com>
> >> >> Signed-off-by: Minchan Kim <minchan@...nel.org>
> >> >
> >> > Kyeongdon, Could you test this patch on your device?
> >> >
> >> > Thanks.
> >>
> >> Sorry to have kept you waiting,
> >> Obviously, I couldn't see allocation fail message with this patch.
> >> But, there is something to make some delay(not sure yet this is normal).
> > 
> > You mean new changes makes start-up delay of your application sometime
> > still,
> > but not frequent like old?
> > 
> I couldn't see start-up delay during my test after this patch.
> But, I checked the return value from alloc function like the below :
> 
> static void *zcomp_lz4_create(void)
> <snip>
>   ret = kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
> 			__GFP_NOWARN | __GFP_NOMEMALLOC);
>   printk("%s: %d: ret = %p\n",__func__,__LINE__,ret);	//line 32
>   if (!ret) {
>     ret = __vmalloc(LZ4_MEM_COMPRESS,
> 		GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
> 		__GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
> 		PAGE_KERNEL);
>     printk("%s: %d: ret = %p\n",__func__,__LINE__,ret);	//line 38
>   }
>   return ret;

OK, I got it. 

> 
> log message :
> [  352.226014][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.226035][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.226791][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.226809][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.230348][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.230369][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.230460][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.230485][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.230507][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.230520][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.230608][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.230619][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.230888][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.230902][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
> [  352.234024][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234060][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234359][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234384][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234618][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234639][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234667][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234685][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234738][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234748][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234800][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234816][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.234852][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.234865][0] zcomp_lz4_create: 38: ret =           (null)
> [  352.235136][0] zcomp_lz4_create: 32: ret =           (null)
> [  352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000
> 
> I thought this pattern from vmalloc is not normal.
> >>
> >> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> >> {
> >> <snip>
> >>
> >> zstrm->private = comp->backend->create();
> >> ^ // sometimes, return 'null' continually(2-5times)
> > 
> > Hmm, I think it is caused by __GFP_NOMEMALLOC.
> > Could you test it without the flag?
> > 
> >>
> >> As you know, if there is 'null' return, this function is called again to
> >> get a memory in while() loop. I just checked this one with printk().
> >>
> >> If you guys don't mind, I'll test more with trace log to check time
> > delay.
> > 
> > No problem.
> > 
> >>
> >> However, If this is fully expectable status to you.
> >> I think I don't need to do it.
> > 
> > It's not what I expected. Actually, I thought failure of vmalloc
> > in that place should be *really really* rare. I think it's caused by
> > __GFP_NOMEMALLOC so I want to see test result without the flag.
> > 
> > Thanks for the careful test!
> > 
> You're welcome.
> 
> After I removed flag '__GFP_NOMEMALLOC', I couldn't find return 'null'
> from vmalloc until now.
 
Thanks for the test.
You said you cannot see the delay any more so it's fine as it is.

Regardless of it, I want to remove __GFP_NOMEMALLOC in the gfp_mask.
The reason we decided to add vmalloc fallback is that we expect that
it is likely to succeed a few order-0 pages to allocate (sizeof(buffer)
> PAGE_SIZE) although it was failed by kmalloc.
But what I missing is that zram-swap is working in direct reclaim
normally so it is more flexible to use emegency memory pool(
e,g, zsmalloc doesn't use __GFP_NOMEMALLOC) so there is no point
to use __GFP_NOMEMALLOC in here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ