linux-kernel - Re: [PATCH] mmc: core: don't set limits.discard

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <794af2df-3bd7-43c2-e5ff-9453cba1d424@intel.com>
Date:   Thu, 1 Oct 2020 10:02:50 +0300
From:   Adrian Hunter <adrian.hunter@...el.com>
To:     Coly Li <colyli@...e.de>
Cc:     linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, Vicente Bergas <vicencb@...il.com>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] mmc: core: don't set limits.discard_granularity as 0

On 1/10/20 9:29 am, Coly Li wrote:
> On 2020/10/1 14:14, Adrian Hunter wrote:
>> On 1/10/20 7:36 am, Coly Li wrote:
>>> On 2020/10/1 01:23, Adrian Hunter wrote:
>>>> On 30/09/20 7:08 pm, Coly Li wrote:
>>>>> In mmc_queue_setup_discard() the mmc driver queue's discard_granularity
>>>>> might be set as 0 (when card->pref_erase > max_discard) while the mmc
>>>>> device still declares to support discard operation. This is buggy and
>>>>> triggered the following kernel warning message,
>>>>>
>>>>> WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294
>>>>> CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1
>>>>> Hardware name: Google Kevin (DT)
>>>>> pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
>>>>> pc : __blkdev_issue_discard+0x200/0x294
>>>>> lr : __blkdev_issue_discard+0x54/0x294
>>>>> sp : ffff800011dd3b10
>>>>> x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace:
>>>>> __blkdev_issue_discard+0x200/0x294
>>>>> __submit_discard_cmd+0x128/0x374
>>>>> __issue_discard_cmd_orderly+0x188/0x244
>>>>> __issue_discard_cmd+0x2e8/0x33c
>>>>> issue_discard_thread+0xe8/0x2f0
>>>>> kthread+0x11c/0x120
>>>>> ret_from_fork+0x10/0x1c
>>>>> ---[ end trace e4c8023d33dfe77a ]---
>>>>>
>>>>> This patch fixes the issue by setting discard_granularity as SECTOR_SIZE
>>>>> instead of 0 when (card->pref_erase > max_discard) is true. Now no more
>>>>> complain from __blkdev_issue_discard() for the improper value of discard
>>>>> granularity.
>>>>>
>>>>> Fixes: commit e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout")
>>>>
>>>> That "Fixes" tag is a bit misleading.  For some time, the block layer had
>>>> no problem with discard_granularity of zero, and blk_bio_discard_split()
>>>> still doesn't (see below).
>>>>
>>>> static struct bio *blk_bio_discard_split(struct request_queue *q,
>>>> 					 struct bio *bio,
>>>> 					 struct bio_set *bs,
>>>> 					 unsigned *nsegs)
>>>> {
>>>> 	unsigned int max_discard_sectors, granularity;
>>>> 	int alignment;
>>>> 	sector_t tmp;
>>>> 	unsigned split_sectors;
>>>>
>>>> 	*nsegs = 1;
>>>>
>>>> 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
>>>> 	granularity = max(q->limits.discard_granularity >> 9, 1U);
>>>>
>>>
>>> >From Documentation/block/queue-sysfs.rst, the discard_granularity is
>>> described as,
>>>
>>> discard_granularity (RO)
>>> ------------------------
>>> This shows the size of internal allocation of the device in bytes, if
>>> reported by the device. A value of '0' means device does not support
>>> the discard functionality.
>>>
>>>
>>> And from Documentation/ABI/testing/sysfs-block, the discard_granularity
>>> is described as,
>>>
>>> What:           /sys/block/<disk>/queue/discard_granularity
>>> Date:           May 2011
>>> Contact:        Martin K. Petersen <martin.petersen@...cle.com>
>>> Description:
>>>                 Devices that support discard functionality may
>>>                 internally allocate space using units that are bigger
>>>                 than the logical block size. The discard_granularity
>>>                 parameter indicates the size of the internal allocation
>>>                 unit in bytes if reported by the device. Otherwise the
>>>                 discard_granularity will be set to match the device's
>>>                 physical block size. A discard_granularity of 0 means
>>>                 that the device does not support discard functionality.
>>>
>>>
>>> Therefore I took it as a bug when a driver sets its queue
>>> discard_granularity as 0 but still announces to support discard operation.
>>>
>>> But if you don't like the Fixes: tag, it is OK for me to remove it in
>>> next version.
>>
>> Not at all.  I just wrote "a bit misleading" because people might also want
>> to know from what patch things stopped working.
> 
> Oh maybe I understand you. Yes, although this fixed patch was bug, but
> the warning was triggered since the new discard alignment changes got
> merged.
> 
> Hmm, maybe I should add the Fixes tag to commit b35fd7422c2f ("block:
> check queue's limits.discard_granularity in __blkdev_issue_discard()").
> 
> How do you think of this commit id ?

Yes that could be mentioned in the commit message or Fixes or both.
With that:

Acked-by: Adrian Hunter <adrian.hunter@...el.com>