lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJX1YtYhbutYFC-cT+dRoRvjqkOY_qSWtU_tSUZ99Q9-yGUF_g@mail.gmail.com>
Date:   Wed, 2 May 2018 14:11:49 +0200
From:   Gi-Oh Kim <gi-oh.kim@...fitbricks.com>
To:     shli@...nel.org
Cc:     linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        Gioh Kim <gi-oh.kim@...fitbricks.com>
Subject: Re: [PATCH] md/raid1: add error handling of read error from FailFast device

On Wed, May 2, 2018 at 1:08 PM, Gioh Kim <gi-oh.kim@...fitbricks.com> wrote:
> Current handle_read_error() function calls fix_read_error()
> only if md device is RW and rdev does not include FailFast flag.
> It does not handle a read error from a RW device including
> FailFast flag.
>
> I am not sure it is intended. But I found that write IO error
> sets rdev faulty. The md module should handle the read IO error and
> write IO error equally. So I think read IO error should set rdev faulty.
>
> Signed-off-by: Gioh Kim <gi-oh.kim@...fitbricks.com>
> ---
>  drivers/md/raid1.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e9e3308cb0a7..4445179aa4c8 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>                 fix_read_error(conf, r1_bio->read_disk,
>                                r1_bio->sector, r1_bio->sectors);
>                 unfreeze_array(conf);
> +       } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> +               md_error(mddev, rdev);
>         } else {
>                 r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
>         }
> --
> 2.14.1
>

I think it would be helpful to show how I tested it.

As following I used Ubuntu 17.10 and mdadm v4.0.
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# uname -a
Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux
# mdadm --version
mdadm - v4.0 - 2017-01-09

Following is how I generated the read IO error and checked md device.
After read IO, no device was set as faulty

# modprobe scsi_debug num_parts=2
# man mdadm
# mdadm -C /dev/md111 --failfast -l 1 -n 2 /dev/sdc1 /dev/sdc2
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md111 started.
# mdadm -D /dev/md111
/dev/md111:
        Version : 1.2
  Creation Time : Wed May  2 10:55:35 2018
     Raid Level : raid1
     Array Size : 3904
  Used Dev Size : 3904
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed May  2 10:55:36 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : ws00837:111  (local to host ws00837)
           UUID : 9f214193:03cf7c97:3208da22:d6ab8a13
         Events : 17

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync failfast   /dev/sdc1
       1       8       34        1      active sync failfast   /dev/sdc2
# cat /proc/mdstat
Personalities : [raid1]
md111 : active raid1 sdc2[1] sdc1[0]
      3904 blocks super 1.2 [2/2] [UU]

unused devices: <none>
# echo -1 > /sys/module/scsi_debug/parameters/every_nth && echo 4 >
/sys/module/scsi_debug/parameters/opts
# dd if=/dev/md111 of=/dev/null bs=4K count=1 iflag=direct &
[1] 6322
# dd: error reading '/dev/md111': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 124,376 s, 0,0 kB/s

[1]+  Exit 1                  dd if=/dev/md111 of=/dev/null bs=4K
count=1 iflag=direct
# mdadm -D /dev/md111/dev/md111:
        Version : 1.2
  Creation Time : Wed May  2 10:55:35 2018
     Raid Level : raid1
     Array Size : 3904
  Used Dev Size : 3904
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed May  2 10:55:36 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync failfast   /dev/sdc1
       1       8       34        1      active sync failfast   /dev/sdc2


Following is how I generated the write IO error and checked md device.
After write IO error, one device was set as faulty.

gohkim@...0837:~$ sudo modprobe scsi_debug num_parts=2
gohkim@...0837:~$ sudo mdadm -C /dev/md111 --failfast -l 1 -n 2
/dev/sdc1 /dev/sdc2
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md111 started.
gohkim@...0837:~$ sudo mdadm -D /dev/md111
/dev/md111:
        Version : 1.2
  Creation Time : Wed May  2 14:03:30 2018
     Raid Level : raid1
     Array Size : 3904
  Used Dev Size : 3904
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed May  2 14:03:31 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : ws00837:111  (local to host ws00837)
           UUID : ba51fe65:c517a25a:a381ccc5:3617322b
         Events : 17

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync failfast   /dev/sdc1
       1       8       34        1      active sync failfast   /dev/sdc2
gohkim@...0837:~$ echo -1 | sudo tee /sys/module/scsi_debug/parameters/every_nth
-1
gohkim@...0837:~$ echo 4 | sudo tee /sys/module/scsi_debug/parameters/opts
4
gohkim@...0837:~$ sudo dd if=/dev/zero of=/dev/md111 bs=4K count=1
oflag=direct &
[1] 13081
gohkim@...0837:~$ dd: error writing '/dev/md111': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 184,523 s, 0,0 kB/s

[1]+  Exit 1                  sudo dd if=/dev/zero of=/dev/md111 bs=4K
count=1 oflag=direct
gohkim@...0837:~$ sudo mdadm -D /dev/md111
/dev/md111:
        Version : 1.2
  Creation Time : Wed May  2 14:03:30 2018
     Raid Level : raid1
     Array Size : 3904
  Used Dev Size : 3904
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed May  2 14:07:47 2018
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync failfast   /dev/sdc1
       -       0        0        1      removed

       1       8       34        -      faulty failfast   /dev/sdc2



-- 
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 176 2697 8962
Fax:      +49 30 577 008 299
Email:    gi-oh.kim@...fitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ