linux-kernel - Re: [PATCH v4 5/5] md/raid10: Atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <3fa37335-2353-4d24-b2ae-e11f2c26ee4b@oracle.com>
Date: Sun, 17 Nov 2024 18:32:02 +0000
From: John Garry <john.g.garry@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Song Liu <song@...nel.org>
Cc: axboe@...nel.dk, hch@....de, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
        martin.petersen@...cle.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v4 5/5] md/raid10: Atomic write support

On 16/11/2024 03:50, Yu Kuai wrote:
> Hi,
> 
> 在 2024/11/16 2:19, Song Liu 写道:
>> On Tue, Nov 12, 2024 at 4:43 AM John Garry <john.g.garry@...cle.com> 
>> wrote:
>>>
>>> Set BLK_FEAT_ATOMIC_WRITES_STACKED to enable atomic writes.
>>>
>>> For an attempt to atomic write to a region which has bad blocks, error
>>> the write as we just cannot do this. It is unlikely to find devices 
>>> which
>>> support atomic writes and bad blocks.
>>>
>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>> ---
>>>   drivers/md/raid10.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
> 
> Reviewed-by: Yu Kuai <yukuai3@...wei.com>
> 
> One nit below:
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index 8c7f5daa073a..a3936a67e1e8 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -1255,6 +1255,7 @@ static void raid10_write_one_disk(struct mddev 
>>> *mddev, struct r10bio *r10_bio,
>>>          const enum req_op op = bio_op(bio);
>>>          const blk_opf_t do_sync = bio->bi_opf & REQ_SYNC;
>>>          const blk_opf_t do_fua = bio->bi_opf & REQ_FUA;
>>> +       const blk_opf_t do_atomic = bio->bi_opf & REQ_ATOMIC;
>>>          unsigned long flags;
>>>          struct r10conf *conf = mddev->private;
>>>          struct md_rdev *rdev;
>>> @@ -1273,7 +1274,7 @@ static void raid10_write_one_disk(struct mddev 
>>> *mddev, struct r10bio *r10_bio,
>>>          mbio->bi_iter.bi_sector = (r10_bio->devs[n_copy].addr +
>>>                                     choose_data_offset(r10_bio, rdev));
>>>          mbio->bi_end_io = raid10_end_write_request;
>>> -       mbio->bi_opf = op | do_sync | do_fua;
>>> +       mbio->bi_opf = op | do_sync | do_fua | do_atomic;
>>>          if (!replacement && test_bit(FailFast,
>>>                                       &conf->mirrors[devnum].rdev- 
>>> >flags)
>>>                           && enough(conf, devnum))
>>> @@ -1468,7 +1469,15 @@ static void raid10_write_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>                                  continue;
>>>                          }
>>>                          if (is_bad) {
>>> -                               int good_sectors = first_bad - 
>>> dev_sector;
>>> +                               int good_sectors;
>>> +
>>> +                               if (bio->bi_opf & REQ_ATOMIC) {
>>> +                                       /* We just cannot atomically 
>>> write this ... */
> 
> Maybe mention that we can if there is at least one disk without any BB,
> it's just benefit does not worth the complexity. And return the special
> error code to let user retry without atomic write.

ok

> 
>>> +                                       error = -EFAULT;
>>
>> Is EFAULT the right error code here? I think we should return something
>> covered by blk_errors?

sure, so maybe explicitly use BLK_STS_IOERR / EIO, which is what we 
generally use in raid drivers when we cannot read/write - ok?

> 
> The error code is passed to bio by:
> 
> bio->bi_status = errno_to_blk_status(error);
> 
> So, this is fine.
>>
>> Other than this, 4/5 and 5/5 look good to me.
>>

Thanks,
John