lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPDLahi_-_GaA-tv@eldamar.lan>
Date: Thu, 16 Oct 2025 12:39:38 +0200
From: Salvatore Bonaccorso <carnil@...ian.org>
To: Jens Axboe <axboe@...nel.dk>, Kevin Lumik <kevin@...ee>
Cc: 1116358@...s.debian.org, io-uring@...r.kernel.org,
	linux-kernel@...r.kernel.org, regressions@...ts.linux.dev,
	stable@...r.kernel.org
Subject: Re: [regression] Regression from 90bfb28d5fa8 ("io_uring/rw: drop
 -EOPNOTSUPP check in __io_complete_rw_common()"): LVM snapshots causing I/O
 errors in KVM guest with aio=io_uring set

Hi Jens,

On Mon, Oct 13, 2025 at 12:04:12PM -0600, Jens Axboe wrote:
> On 10/13/25 11:47 AM, Salvatore Bonaccorso wrote:
> > Hi Jens,
> > 
> > Kevin Lumik reported in Debian the following issue (not exactly a
> > minimal reproducer, but bisection results follows):
> > 
> > On Fri, Sep 26, 2025 at 10:34:22AM +0300, Kevin Lumik wrote:
> >  
> >> Dear Maintainer,
> >>
> >> After upgrading from Debian Bookworm to Trixie, an issue within KVM guests when creating an LVM snapshot of its volume
> >> has surfaced. When a LVM snapshot is taken from the host, the guest starts to get I/O errors. The issue seems to only
> >> appear when aio=io_uring is specified in the KVM drive parameters and also seems to resolve when downgrading the kernel
> >> package down to 6.1. The issue is also not reproducible when using aio=native. 
> >>
> >> KVM args for the drive: "-drive id=drive-
> >> virtio0,format=raw,file=/dev/dom/vps_testsql,cache=none,aio=io_uring,index=0,media=disk,if=virtio"
> >>
> >> An IO workload is being created in the VM using "fio --randrepeat=1 --ioengine=io_uring --direct=1 --gtod_reduce=1 --
> >> name=randwrite --filename=/root/test.bin --bs=4k --iodepth=64 --runtime=60 --numjobs=32 --readwrite=randwrite --size=1G
> >> --rwmixread=75 --group_reporting"
> >>
> >> After running "lvcreate -s /dev/dom/vps_testsql -n test -L 1G" on the host we can observe fio erroring out:
> >>
> >> ...
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=46977024, buflen=4096
> >> fio: pid=7367, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=261505024, buflen=4096
> >> fio: pid=7374, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=973840384, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=9637888, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=159965184, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=857505792, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=90787840, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=26427392, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=955621376, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=96169984, buflen=4096
> >> fio: pid=7372, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7362, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=203702272, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=814649344, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=91467776, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=948256768, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=105295872, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=75247616, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=1062293504, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=111955968, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=942563328, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=117354496, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=1050402816, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=125419520, buflen=4096
> >> fio: io_u error on file /root/test.bin: Input/output error: write offset=129044480, buflen=4096
> >> fio: pid=7369, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7373, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7348, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7375, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7358, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >> fio: pid=7359, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
> >>
> >> randwrite: (groupid=0, jobs=32): err= 5 (file:io_u.c:1876, func=io_u error, error=Input/output error): pid=7344: Thu Sep
> >> 25 16:28:30 2025
> >>   write: IOPS=194k, BW=758MiB/s (795MB/s)(3976MiB/5244msec); 0 zone resets
> >>    bw (  KiB/s): min=657104, max=1009816, per=100.00%, avg=785345.80, stdev=4331.73, samples=320
> >>    iops        : min=164276, max=252454, avg=196336.40, stdev=1082.93, samples=320
> >>   cpu          : usr=0.33%, sys=1.07%, ctx=18912, majf=0, minf=455
> >>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8%
> >>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> >>      complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> >>      issued rwts: total=0,1019955,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>      latency   : target=0, window=0, percentile=100.00%, depth=64
> >>
> >> Run status group 0 (all jobs):
> >>   WRITE: bw=758MiB/s (795MB/s), 758MiB/s-758MiB/s (795MB/s-795MB/s), io=3976MiB (4169MB), run=5244-5244msec
> >>
> >> Disk stats (read/write):
> >>   vda: ios=1/1001361, merge=0/2, ticks=0/10314531, in_queue=10314549, util=97.82%
> >> Bus error
> >>
> >> ---
> >>
> >> And the error is also visible in the kernel error log of the VM:
> >> [  361.962970] I/O error, dev vda, sector 83277192 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.963945] I/O error, dev vda, sector 83067480 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964489] I/O error, dev vda, sector 82881208 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964499] I/O error, dev vda, sector 83031976 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964501] I/O error, dev vda, sector 83089832 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964503] I/O error, dev vda, sector 83156184 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964522] I/O error, dev vda, sector 83183384 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964524] I/O error, dev vda, sector 83310704 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964525] I/O error, dev vda, sector 83311144 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >> [  361.964532] I/O error, dev vda, sector 83315272 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
> >>
> >> The host does not generate any erros, they only seem to occur within the VM.
> > 
> > Now I asked Kevin if a bisection is possible, and the following
> > results was found (https://bugs.debian.org/1116358#41):
> > 
> > I've identified the first bad commit using git bisect:
> > 
> > 90bfb28d5fa8127a113a140c9791ea0b40ab156a is the first bad commit
> > commit 90bfb28d5fa8127a113a140c9791ea0b40ab156a
> > Author: Jens Axboe <axboe@...nel.dk>
> > Date:   Tue Sep 10 08:57:04 2024 -0600
> > 
> >     io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
> > 
> >     A recent change ensured that the necessary -EOPNOTSUPP -> -EAGAIN
> >     transformation happens inline on both the reader and writer side,
> >     and hence there's no need to check for both of these anymore on
> >     the completion handler side.
> > 
> >     Signed-off-by: Jens Axboe <axboe@...nel.dk>
> > 
> >  io_uring/rw.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > 
> > Here is the git bisect log as well:
> > 
> > git bisect start
> > # status: waiting for both good and bad commits
> > # good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
> > git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
> > # status: waiting for bad commit, 1 good commit known
> > # bad: [59b723cd2adbac2a34fc8e12c74ae26ae45bf230] Linux 6.12-rc6
> > git bisect bad 59b723cd2adbac2a34fc8e12c74ae26ae45bf230
> > # bad: [de848da12f752170c2ebe114804a985314fd5a6a] Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel
> > git bisect bad de848da12f752170c2ebe114804a985314fd5a6a
> > # bad: [7b17f5ebd5fc5e9275eaa5af3d0771f2a7b01bbf] Merge tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > git bisect bad 7b17f5ebd5fc5e9275eaa5af3d0771f2a7b01bbf
> > # good: [64dd3b6a79f0907d36de481b0f15fab323a53e5a] Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm
> > git bisect good 64dd3b6a79f0907d36de481b0f15fab323a53e5a
> > # bad: [daa394f0f9d3cb002c72e2d3db99972e2ee42862] Merge tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad daa394f0f9d3cb002c72e2d3db99972e2ee42862
> > # good: [effdcd5275ed645f6e0f8e8ce690b97795722197] Merge tag 'affs-for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> > git bisect good effdcd5275ed645f6e0f8e8ce690b97795722197
> > # bad: [26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4] Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux
> > git bisect bad 26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4
> > # bad: [3a4d319a8fb5a9bbdf5b31ef32841eb286b1dcc2] Merge tag 'for-6.12/io_uring-20240913' of git://git.kernel.dk/linux
> > git bisect bad 3a4d319a8fb5a9bbdf5b31ef32841eb286b1dcc2
> > # good: [df2825e98507d10cb037a308087ecd7cb3f6688d] btrfs: always pass readahead state to defrag
> > git bisect good df2825e98507d10cb037a308087ecd7cb3f6688d
> > # good: [69a3a0a45a2f72412c2ba31761cc9193bb746fef] Merge tag 'erofs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
> > git bisect good 69a3a0a45a2f72412c2ba31761cc9193bb746fef
> > # good: [ecd5c9b29643f383d39320e30d21b8615bd893da] io_uring/kbuf: add io_kbuf_commit() helper
> > git bisect good ecd5c9b29643f383d39320e30d21b8615bd893da
> > # good: [f011c9cf04c06f16b24f583d313d3c012e589e50] io_uring/sqpoll: do not allow pinning outside of cpuset
> > git bisect good f011c9cf04c06f16b24f583d313d3c012e589e50
> > # bad: [84eacf177faa605853c58e5b1c0d9544b88c16fd] io_uring/io-wq: inherit cpuset of cgroup in io worker
> > git bisect bad 84eacf177faa605853c58e5b1c0d9544b88c16fd
> > # bad: [90bfb28d5fa8127a113a140c9791ea0b40ab156a] io_uring/rw: drop - EOPNOTSUPP check in __io_complete_rw_common()
> > git bisect bad 90bfb28d5fa8127a113a140c9791ea0b40ab156a
> > # good: [c0a9d496e0fece67db777bd48550376cf2960c47] io_uring/rw: treat - EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
> > git bisect good c0a9d496e0fece67db777bd48550376cf2960c47
> > # first bad commit: [90bfb28d5fa8127a113a140c9791ea0b40ab156a] io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
> > 
> > #regzbot introduced: 90bfb28d5fa8127a113a140c9791ea0b40ab156a
> > #regzbot link: https://bugs.debian.org/1116358
> > 
> > Does thi ring any bell?
> 
> Thanks to both of you, this is very useful! I guess the original commit
> is mistaken, there's still a bubbling up of EOPNOTSUPP there. I'd say
> let's just revert that commit, I will do that upstream and have it buble
> down to the stable kernels too.
> 
> I can always look into this later and reintroduce it, if need be.

Thanks a lot for that. Is this enough for you or do you want still as
well an explicit ack from Kevin that only applying the revert on top
of 6.12.48 fixes the issue (i.e. to exclude any other side effects
from commits inbetween; still while the bisect points at that specific
one)?

Regards,
Salvatore

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ