linux-kernel - Re: [PATCH] btrfs: Split remaining space to discard in chunks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4bef0c7b-df8b-41dc-9fe1-022cc4937def@gmail.com>
Date: Mon, 2 Sep 2024 22:17:37 +0200
From: Luca Stefani <luca.stefani.ge1@...il.com>
To: dsterba@...e.cz
Cc: Qu Wenruo <quwenruo.btrfs@....com>, Chris Mason <clm@...com>,
 Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
 linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: Split remaining space to discard in chunks



On 02/09/24 22:11, David Sterba wrote:
> On Mon, Sep 02, 2024 at 01:43:00PM +0200, Luca Stefani wrote:
>> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
>> mostly empty although we will do the split according to our super block
>> locations, the last super block ends at 256G, we can submit a huge
>> discard for the range [256G, 8T), causing a super large delay.
> 
> I'm not sure that this will be different than what we already do, or
> have the large delays been observed in practice? The range passed to
> blkdev_issue_discard() might be large but internally it's still split to
> smaller sizes depending on the queue limits, IOW the device.
> 
> Bio is allocated and limited by bio_discard_limit(bdev, *sector);
> https://elixir.bootlin.com/linux/v6.10.7/source/block/blk-lib.c#L38
> 
>    struct bio *blk_alloc_discard_bio(struct block_device *bdev,
> 		  sector_t *sector, sector_t *nr_sects, gfp_t gfp_mask)
>    {
> 	  sector_t bio_sects = min(*nr_sects, bio_discard_limit(bdev, *sector));
> 	  struct bio *bio;
> 
> 	  if (!bio_sects)
> 		  return NULL;
> 
> 	  bio = bio_alloc(bdev, 0, REQ_OP_DISCARD, gfp_mask);
>    ...
> 
> 
> Then used in __blkdev_issue_discard()
> https://elixir.bootlin.com/linux/v6.10.7/source/block/blk-lib.c#L63
> 
>    int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> 		  sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
>    {
> 	  struct bio *bio;
> 
> 	  while ((bio = blk_alloc_discard_bio(bdev, &sector, &nr_sects,
> 			  gfp_mask)))
> 		  *biop = bio_chain_and_submit(*biop, bio);
> 	  return 0;
>    }
> 
> This is basically just a loop, chopping the input range as needed. The
> btrfs code does effectively the same, there's only the superblock,
> progress accounting and error handling done.
> 
> As the maximum size of a single discard request depends on a device we
> don't need to artificially limit it because this would require more IO
> requests and can be slower.

Thanks for taking a look, this change was prompted after I've been 
seeing issues due to the discard kthread blocking an userspace process 
causing device not to suspend.
https://lore.kernel.org/lkml/20240822164908.4957-1-luca.stefani.ge1@gmail.com/ 
is the proposed solution, but Qu mentioned that there is another place 
where it could happen that I didn't cover, and I think what I change 
here (unless it's the wrong place) allows me to add the similar 
`btrfs_trim_interrupted` checks to stop.

Please let me know if that makes sense to you, if that's the case I 
guess it would make sense to send the 2 patches together?

Luca.