lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221101235144.06a3dbd3@xps.demsh.org>
Date:   Tue, 1 Nov 2022 23:51:44 +0300
From:   Dmitrii Tcvetkov <me@...sh.org>
To:     Keith Busch <kbusch@...nel.org>
Cc:     Jens Axboe <axboe@...nel.dk>, Song Liu <song@...nel.org>,
        linux-raid@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [bisected] RAID1 direct IO redirecting sector loop since 6.0

On Tue, 1 Nov 2022 11:22:21 -0600
Keith Busch <kbusch@...nel.org> wrote:

> On Tue, Nov 01, 2022 at 12:15:58AM +0300, Dmitrii Tcvetkov wrote:
> > 
> > # cat /proc/7906/stack
> > [<0>] submit_bio_wait+0xdb/0x140
> > [<0>] blkdev_direct_IO+0x62f/0x770
> > [<0>] blkdev_read_iter+0xc1/0x140
> > [<0>] vfs_read+0x34e/0x3c0
> > [<0>] __x64_sys_pread64+0x74/0xc0
> > [<0>] do_syscall_64+0x6a/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> > 
> > After "mdadm --fail" invocation the last line becomes:
> > [pid  7906] pread64(13, 0x627c34c8d200, 4096, 0) = -1 EIO
> > (Input/output error)
> 
> It looks like something isn't accounting for the IO size correctly
> when there's an offset. It may be something specific to one of the
> stacking drivers in your block setup. Does this still happen without
> the cryptosetup step?
> 
I created setup lvm(mdraid(gpt(HDD))):

# lsblk -t -a
NAME                 ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
...
sdd                          0    512      0     512     512    1 bfq        64 128    0B
├─sdd3                       0    512      0     512     512    1 bfq        64 128    0B
│ └─md1                      0    512      0     512     512    1           128 128    0B
│   ├─512lvmraid-zfs         0    512      0     512     512    1           128 128    0B
│   └─512lvmraid-wrk         0    512      0     512     512    1           128 128    0B
sde                          0    512      0     512     512    1 bfq        64 128    0B
├─sde3                       0    512      0     512     512    1 bfq        64 128    0B
│ └─md1                      0    512      0     512     512    1           128 128    0B
│   ├─512lvmraid-zfs         0    512      0     512     512    1           128 128    0B
│   └─512lvmraid-wrk         0    512      0     512     512    1           128 128    0B

where:
# mdadm --create --level=1 --metadata=1.2 \
	--raid-devices=2 /dev/md1 /dev/sdd3 /dev/sde3
# pvcreate /dev/md1
# vgcreate 512lvmraid /dev/md2

In this case problem doesn't reproduce, both guests start successfully.

It also doesn't reproduce with 4096 sector loop:
# lsblk -t -a                                                                           
NAME                 ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
loop0                        0   4096      0    4096    4096    0 none      128 128    0B
└─md2                        0   4096      0    4096    4096    0           128 128    0B
  ├─4096lvmraid-zfs          0   4096      0    4096    4096    0           128 128    0B
  └─4096lvmraid-wrk          0   4096      0    4096    4096    0           128 128    0B
loop1                        0   4096      0    4096    4096    0 none      128 128    0B
└─md2                        0   4096      0    4096    4096    0           128 128    0B
  ├─4096lvmraid-zfs          0   4096      0    4096    4096    0           128 128    0B
  └─4096lvmraid-wrk          0   4096      0    4096    4096    0           128 128    0B

where:
# losetup --sector-size 4096 -f /dev/sdd4
# losetup --sector-size 4096 -f /dev/sde4
# mdadm --create --level=1 --metadata=1.2 \
	--raid-devices=2 /dev/md2 /dev/loop0 /dev/loop1
# pvcreate /dev/md2
# vgcreate 4096lvmraid /dev/md2

Indeed then something is wrong in LUKS.

> For a different experiment, it may be safer to just force all
> alignment for stacking drivers. Could you try the following and see
> if that gets it working again? 
> 
> ---
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8bb9eef5310e..5c16fdb00c6f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -646,6 +646,7 @@ int blk_stack_limits(struct queue_limits *t,
> struct queue_limits *b, t->misaligned = 1;
>  		ret = -1;
>  	}
> +	blk_queue_dma_alignment(t, t->logical_block_size - 1);
>  
>  	t->max_sectors = blk_round_down_sectors(t->max_sectors,
> t->logical_block_size); t->max_hw_sectors =
> blk_round_down_sectors(t->max_hw_sectors, t->logical_block_size); --

This doesn't compile:
  CC      block/blk-settings.o                                                                 
block/blk-settings.c: In function ‘blk_stack_limits’:
block/blk-settings.c:649:33: error: passing argument 1 of ‘blk_queue_dma_alignment’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  649 |         blk_queue_dma_alignment(t, t->logical_block_size - 1);
      |                                 ^
      |                                 |
      |                                 struct queue_limits *
In file included from block/blk-settings.c:9:
./include/linux/blkdev.h:956:37: note: expected ‘struct request_queue *’ but argument is of type ‘struct queue_limits *’
  956 | extern void blk_queue_dma_alignment(struct request_queue *, int);

I didn't find obvious way to get a request_queue pointer, which corresponds to struct queue_limits *t.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ