linux-kernel - Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKAwkKvfFn18RjupuqGpx4QeAiMYKSq7QUTd3wEL=pkZ+BENpQ@mail.gmail.com>
Date:   Fri, 13 May 2022 14:56:18 +1200
From:   Matthew Ruffell <matthew.ruffell@...onical.com>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        linux-block <linux-block@...r.kernel.org>,
        nbd <nbd@...er.debian.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>, yukuai3@...wei.com
Subject: Re: [PROBLEM] nbd requests become stuck when devices watched by
 inotify emit udev uevent changes

Hi Josef,

Just a friendly ping, I am more than happy to test a patch, if you send it
inline in the email, since the pastebin you used expired after 1 day, and I
couldn't access it.

I came across and tested Yu Kuai's patches [1][2] which are for the same issue,
and they indeed fix the hang. Thank you Yu.

[1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
https://lists.debian.org/nbd/2022/04/msg00212.html

[2] nbd: fix io hung while disconnecting device
https://lists.debian.org/nbd/2022/04/msg00207.html

I am also happy to test any patches to fix the I/O errors.

Thanks,
Matthew

On Tue, Apr 26, 2022 at 9:47 AM Matthew Ruffell
<matthew.ruffell@...onical.com> wrote:
>
> Hi Josef,
>
> The pastebin has expired the link, and I can't access your patch.
> Seems to default to 1 day deletion.
>
> Could you please create a new paste or send the patch inline in this
> email thread?
>
> I am more than happy to try the patch out.
>
> Thank you for your analysis.
> Matthew
>
> On Sat, Apr 23, 2022 at 3:24 AM Josef Bacik <josef@...icpanda.com> wrote:
> >
> > On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell
> > <matthew.ruffell@...onical.com> wrote:
> > >
> > > Dear maintainers of the nbd subsystem,
> > >
> > > A user has come across an issue which causes the nbd module to hang after a
> > > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd
> > > being the server.
> > >
> >
> > Ok there's two problems here, but I want to make sure I have the right
> > fix for the hang first.  Can you apply this patch
> >
> > https://paste.centos.org/view/b1a2d01a
> >
> > and make sure the hang goes away?  Once that part is fixed I'll fix
> > the IO errors, this is just us racing with systemd while we teardown
> > the device and then we're triggering a partition read while the device
> > is going down and it's complaining loudly.  Before we would
> > set_capacity to 0 whenever we disconnected, but that causes problems
> > with file systems that may still have the device open.  However now we
> > only do this if the server does the CLEAR_SOCK ioctl, which clearly
> > can race with systemd poking the device, so I need to make it
> > set_capacity(0) when the last opener closes the device to prevent this
> > style of race.
> >
> > Let me know if that patch fixes the hang, and then I'll work up
> > something for the capacity problem.  Thanks,
> >
> > Josef