lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Jun 2022 12:46:38 -0700
From:   Minchan Kim <minchan@...nel.org>
To:     Jaewon Kim <jaewon31.kim@...sung.com>
Cc:     ngupta@...are.org, senozhatsky@...omium.org,
        avromanov@...rdevices.ru, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        s.suk@...sung.com, ytk.lee@...sung.com, jaewon31.kim@...il.com
Subject: Re: [PATCH] zram_drv: add __GFP_NOMEMALLOC not to use
 ALLOC_NO_WATERMARKS

On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
> The atomic page allocation failure sometimes happened, and most of them
> seem to occur during boot time.
> 
> <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
> <4>[   59.707676] CPU: 5 PID: 1209 Comm: system_server Tainted: G S O      5.4.161-qgki-24219806-abA236USQU0AVE1 #1
> <4>[   59.707691] Call trace:
> <4>[   59.707702]  dump_backtrace.cfi_jt+0x0/0x4
> <4>[   59.707712]  show_stack+0x18/0x24
> <4>[   59.707719]  dump_stack+0xa4/0xe0
> <4>[   59.707728]  warn_alloc+0x114/0x194
> <4>[   59.707734]  __alloc_pages_slowpath+0x828/0x83c
> <4>[   59.707740]  __alloc_pages_nodemask+0x2b4/0x310
> <4>[   59.707747]  alloc_slab_page+0x40/0x5c8
> <4>[   59.707753]  new_slab+0x404/0x420
> <4>[   59.707759]  ___slab_alloc+0x224/0x3b0
> <4>[   59.707765]  __kmalloc+0x37c/0x394
> <4>[   59.707773]  context_struct_to_string+0x110/0x1b8
> <4>[   59.707778]  context_add_hash+0x6c/0xc8
> <4>[   59.707785]  security_compute_sid.llvm.13699573597798246927+0x508/0x5d8
> <4>[   59.707792]  security_transition_sid+0x2c/0x38
> <4>[   59.707804]  selinux_socket_create+0xa0/0xd8
> <4>[   59.707811]  security_socket_create+0x68/0xbc
> <4>[   59.707818]  __sock_create+0x8c/0x2f8
> <4>[   59.707823]  __sys_socket+0x94/0x19c
> <4>[   59.707829]  __arm64_sys_socket+0x20/0x30
> <4>[   59.707836]  el0_svc_common+0x100/0x1e0
> <4>[   59.707841]  el0_svc_handler+0x68/0x74
> <4>[   59.707848]  el0_svc+0x8/0xc
> <4>[   59.707853] Mem-Info:
> <4>[   59.707890] active_anon:223569 inactive_anon:74412 isolated_anon:0
> <4>[   59.707890]  active_file:51395 inactive_file:176622 isolated_file:0
> <4>[   59.707890]  unevictable:1018 dirty:211 writeback:4 unstable:0
> <4>[   59.707890]  slab_reclaimable:14398 slab_unreclaimable:61909
> <4>[   59.707890]  mapped:134779 shmem:1231 pagetables:26706 bounce:0
> <4>[   59.707890]  free:528 free_pcp:844 free_cma:147
> <4>[   59.707900] Node 0 active_anon:894276kB inactive_anon:297648kB active_file:205580kB inactive_file:706488kB unevictable:4072kB isolated(anon):0kB isolated(file):0kB mapped:539116kB dirty:844kB writeback:16kB shmem:4924kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> <4>[   59.707912] Normal free:2112kB min:7244kB low:68892kB high:72180kB active_anon:893140kB inactive_anon:297660kB active_file:204740kB inactive_file:706396kB unevictable:4072kB writepending:860kB present:3626812kB managed:3288700kB mlocked:4068kB kernel_stack:62416kB shadow_call_stack:15656kB pagetables:106824kB bounce:0kB free_pcp:3372kB local_pcp:176kB free_cma:588kB
> <4>[   59.707915] lowmem_reserve[]: 0 0
> <4>[   59.707922] Normal: 8*4kB (H) 5*8kB (H) 13*16kB (H) 25*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1080kB
> <4>[   59.707942] 242549 total pagecache pages
> <4>[   59.707951] 12446 pages in swap cache
> <4>[   59.707956] Swap cache stats: add 212408, delete 199969, find 36869/71571
> <4>[   59.707961] Free swap  = 3445756kB
> <4>[   59.707965] Total swap = 4194300kB
> <4>[   59.707969] 906703 pages RAM
> <4>[   59.707973] 0 pages HighMem/MovableOnly
> <4>[   59.707978] 84528 pages reserved
> <4>[   59.707982] 49152 pages cma reserved
> 
> The kswapd or other reclaim contexts may not prepare enough free pages
> for too many atomic allocations occurred in short time. But zram may not
> be helpful for this atomic allocation even though zram is used to
> reclaim.
> 
> To get one zs object for a specific size, zram may allocate serveral
> pages. And this can be happened on different class sizes at the same
> time. It means zram may consume more pages to reclaim only one page.
> This inefficiency may consume all free pages below watmerk min by a
> process having PF_MEMALLOC like kswapd.

However, that's how zram has worked for a long time(allocate memory
under memory pressure) and many folks already have raised min_free_kbytes
when they use zram as swap. If we don't allow the allocation, swap out
fails easier than old, which would break existing tunes.

> 
> We can avoid this by adding __GFP_NOMEMALLOC. PF_MEMALLOC process won't
> use ALLOC_NO_WATERMARKS.
> 
> Signed-off-by: Jaewon Kim <jaewon31.kim@...sung.com>
> ---
>  drivers/block/zram/zram_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index b8549c61ff2c..39cd1397ed3b 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1383,6 +1383,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
>  
>  	handle = zs_malloc(zram->mem_pool, comp_len,
>  			__GFP_KSWAPD_RECLAIM |
> +			__GFP_NOMEMALLOC |
>  			__GFP_NOWARN |
>  			__GFP_HIGHMEM |
>  			__GFP_MOVABLE);
> -- 
> 2.17.1
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ