linux-kernel - Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <61be7c4f-8a97-97ef-e2be-795ef654c16d@kernel.dk>
Date:   Wed, 28 Jun 2017 11:42:53 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     wenxiong@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org
Cc:     keith.busch@...el.com, bjking@...ux.vnet.ibm.com
Subject: Re: [PATCH] fs: System memory leak when running HTX with T10 DIF
 enabled

On 06/28/2017 10:32 AM, wenxiong@...ux.vnet.ibm.com wrote:
> From: Wen Xiong <wenxiong@...ux.vnet.ibm.com>
> 
> With nvme devive + T10 enabled, On a system it has 256GB and started
> logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
> it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
> leaking.
> 
> /proc/meminfo | grep SUnreclaim...
> 
> SUnreclaim:      6752128 kB
> SUnreclaim:      6874880 kB
> SUnreclaim:      7238080 kB
> ....
> SUnreclaim:     22307264 kB
> SUnreclaim:     22485888 kB
> SUnreclaim:     22720256 kB
> 
> When testcases with T10 enabled call into __blkdev_direct_IO_simple,
> code doesn't free memory allocated by bio_integrity_alloc. The patch
> fixes the issue. HTX has been run with +60 hours without failure.
> 
> failing stack:
> 
> [36587.216329] [c000002ff60874a0] [c000000000bcac68]
> dump_stack+0xb0/0xf0 (unreliable)
> [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308
> [36587.216407] [c000002ff6087570] [c000000000282530]
> out_of_memory+0x4e0/0x650
> [36587.216465] [c000002ff6087610] [c00000000028a154]
> __alloc_pages_nodemask+0xf34/0x10b0
> [36587.216534] [c000002ff6087810] [c00000000030b800]
> alloc_pages_current+0xc0/0x1d0
> [36587.216603] [c000002ff6087870] [c00000000031907c]
> new_slab+0x46c/0x7d0
> [36587.216661] [c000002ff6087950] [c00000000031bf00]
> ___slab_alloc+0x570/0x670
> [36587.216718] [c000002ff6087a70] [c00000000031c05c]
> __slab_alloc+0x5c/0x90
> [36587.216776] [c000002ff6087ad0] [c00000000031c1f4]
> kmem_cache_alloc+0x164/0x300
> [36587.216845] [c000002ff6087b20] [c0000000002de120]
> mmap_region+0x3e0/0x6e0
> [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480
> [36587.216960] [c000002ff6087c80] [c0000000002af244]
> vm_mmap_pgoff+0x114/0x160
> [36587.217018] [c000002ff6087d60] [c0000000002db6b0]
> SyS_mmap_pgoff+0x230/0x300
> [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0
> [36587.217145] [c000002ff6087e30] [c00000000000b184]
> system_call+0x38/0xe0
> [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no
> killable processes...
> [36587.217481]
> 
> Signed-off-by: Wen Xiong <wenxiong@...ux.vnet.ibm.com>
> Reviewed by:  Brian King <brking@...ux.vnet.ibm.com>
> Tested by: Murali Iyer <mniyer@...ibm.com>
> ---
>  fs/block_dev.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 519599d..e871444 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>  
>  	if (unlikely(bio.bi_error))
>  		return bio.bi_error;
> +
> +	if (bio_integrity(&bio))
> +		bio_integrity_free(&bio);
> +
>  	return ret;
>  }

If this went through blk-throttle, then we'd also need to drop
the task association. Should this just use __bio_free(&bio)?

-- 
Jens Axboe