lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <737ea213-548f-6206-f4e5-3c80c17651b3@kernel.dk>
Date:   Wed, 28 Jun 2017 11:45:43 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     wenxiong@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org
Cc:     keith.busch@...el.com, bjking@...ux.vnet.ibm.com
Subject: Re: [PATCH] fs: System memory leak when running HTX with T10 DIF
 enabled

On 06/28/2017 11:42 AM, Jens Axboe wrote:
> On 06/28/2017 10:32 AM, wenxiong@...ux.vnet.ibm.com wrote:
>> From: Wen Xiong <wenxiong@...ux.vnet.ibm.com>
>>
>> With nvme devive + T10 enabled, On a system it has 256GB and started
>> logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
>> it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
>> leaking.
>>
>> /proc/meminfo | grep SUnreclaim...
>>
>> SUnreclaim:      6752128 kB
>> SUnreclaim:      6874880 kB
>> SUnreclaim:      7238080 kB
>> ....
>> SUnreclaim:     22307264 kB
>> SUnreclaim:     22485888 kB
>> SUnreclaim:     22720256 kB
>>
>> When testcases with T10 enabled call into __blkdev_direct_IO_simple,
>> code doesn't free memory allocated by bio_integrity_alloc. The patch
>> fixes the issue. HTX has been run with +60 hours without failure.
>>
>> failing stack:
>>
>> [36587.216329] [c000002ff60874a0] [c000000000bcac68]
>> dump_stack+0xb0/0xf0 (unreliable)
>> [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308
>> [36587.216407] [c000002ff6087570] [c000000000282530]
>> out_of_memory+0x4e0/0x650
>> [36587.216465] [c000002ff6087610] [c00000000028a154]
>> __alloc_pages_nodemask+0xf34/0x10b0
>> [36587.216534] [c000002ff6087810] [c00000000030b800]
>> alloc_pages_current+0xc0/0x1d0
>> [36587.216603] [c000002ff6087870] [c00000000031907c]
>> new_slab+0x46c/0x7d0
>> [36587.216661] [c000002ff6087950] [c00000000031bf00]
>> ___slab_alloc+0x570/0x670
>> [36587.216718] [c000002ff6087a70] [c00000000031c05c]
>> __slab_alloc+0x5c/0x90
>> [36587.216776] [c000002ff6087ad0] [c00000000031c1f4]
>> kmem_cache_alloc+0x164/0x300
>> [36587.216845] [c000002ff6087b20] [c0000000002de120]
>> mmap_region+0x3e0/0x6e0
>> [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480
>> [36587.216960] [c000002ff6087c80] [c0000000002af244]
>> vm_mmap_pgoff+0x114/0x160
>> [36587.217018] [c000002ff6087d60] [c0000000002db6b0]
>> SyS_mmap_pgoff+0x230/0x300
>> [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0
>> [36587.217145] [c000002ff6087e30] [c00000000000b184]
>> system_call+0x38/0xe0
>> [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no
>> killable processes...
>> [36587.217481]
>>
>> Signed-off-by: Wen Xiong <wenxiong@...ux.vnet.ibm.com>
>> Reviewed by:  Brian King <brking@...ux.vnet.ibm.com>
>> Tested by: Murali Iyer <mniyer@...ibm.com>
>> ---
>>  fs/block_dev.c |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 519599d..e871444 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>>  
>>  	if (unlikely(bio.bi_error))
>>  		return bio.bi_error;
>> +
>> +	if (bio_integrity(&bio))
>> +		bio_integrity_free(&bio);
>> +
>>  	return ret;
>>  }
> 
> If this went through blk-throttle, then we'd also need to drop
> the task association. Should this just use __bio_free(&bio)?

Something like the below. Also fixes the case where we had an
error, your patch is still leaking for that case.


diff --git a/block/bio.c b/block/bio.c
index 888e7801c638..bdc6e6541278 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
 	return bvl;
 }
 
-static void __bio_free(struct bio *bio)
+void __bio_free(struct bio *bio)
 {
 	bio_disassociate_task(bio);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599dddd36..d929f886ee7b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		kfree(vecs);
 
 	if (unlikely(bio.bi_error))
-		return bio.bi_error;
+		ret = bio.bi_error;
+
+	__bio_free(&bio);
+
 	return ret;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..4a29dc3db8cb 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -380,6 +380,7 @@ extern mempool_t *biovec_create_pool(int pool_entries);
 
 extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *);
 extern void bio_put(struct bio *);
+extern void __bio_free(struct bio *);
 
 extern void __bio_clone_fast(struct bio *, struct bio *);
 extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ