[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1767e6d3757f824039b4f799157d262a8dd74ced.camel@suse.com>
Date: Tue, 13 May 2025 09:06:31 +0200
From: Martin Wilck <mwilck@...e.com>
To: Mikulas Patocka <mpatocka@...hat.com>, Christoph Hellwig
<hch@...radead.org>
Cc: Kevin Wolf <kwolf@...hat.com>, dm-devel@...ts.linux.dev,
hreitz@...hat.com, snitzer@...nel.org, bmarzins@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active
paths
Hello Mikulas,
On Mon, 2025-05-12 at 15:46 +0200, Mikulas Patocka wrote:
> Hi
>
>
> On Thu, 8 May 2025, Martin Wilck wrote:
>
> > Hello Kevin,
> >
> > [I'm sorry for the previous email. It seems that I clicked "send"
> > in
> > the wrong window].
> >
> > On Tue, 2025-04-29 at 18:50 +0200, Kevin Wolf wrote:
> > > Multipath cannot directly provide failover for ioctls in the
> > > kernel
> > > because it doesn't know what each ioctl means and which result
> > > could
> > > indicate a path error. Userspace generally knows what the ioctl
> > > it
> > > issued means and if it might be a path error, but neither does it
> > > know
> > > which path the ioctl took nor does it necessarily have the
> > > privileges
> > > to
> > > fail a path using the control device.
> >
> > Thanks for this effort.
> >
> > I have some remarks about your approach. The most important one is
> > that
> > the DM_MPATH_PROBE_PATHS_CMD ioctl appears to be a dangerous
> > command.
> > It sends IO to possibly broken paths and waits for it to complete.
> > In
> > the common error case of a device not responding, this might cause
> > the
> > code to hang for a long time in the kernel ioctl code path, in
> > uninterruptible sleep (note that these probing commands will be
> > queued
>
> Normal reads and writes would also hang in an uninterruptible sleep
> if a
> path stops responding.
Right. That's why multipathd uses TEST UNIT READY if supported by the
storage, and either aio or separate I/O threads to avoid the main
thread being blocked, and generally goes to great lengths to make sure
that misbehaving storage hardware can cause no freeze.
The way I read Kevin's code, you'd queue up additional IO in the same
context that had submitted the original failing IO. I realize that qemu
uses asynchronous IO, but AFAIK the details depend on the qemu options
for the individual VM, and it isn't clear to me whether or not
DM_MPATH_PROBE_PATHS_CMD can bring the entire VM to halt.
> > [...]
> >
> > I am wondering whether the DM_MPATH_PROBE_PATHS_CMD ioctl is
> > necessary
> > at all. Rather than triggering explicit path probing, you could
> > have
> > dm-mpath "handle" IO errors by failing the active path for "path
> > errors". That would be similar to my patch set from 2021 [1], but
> > it
> > would avoid the extra IO and thus the additional risk of hanging in
> > the
> > kernel.
>
> What exactly do you mean? You say "you could have dm-mpath 'handle'
> IO
> errors by failing the active path for "path errors".
What I meant was that dm-mpath could call fail_path() in case of an
error, using a similar mechanism to the block IO code path.
> Christoph doesn't want dm-mpath can't look at the error code - so dm-
> mpath
> doesn't know if path error occured or not.
My impression from the previous discussion was that Christoph mainly
objected to SG_IO requests being retried in the kernel [1], not so much
to the inspection of the error code by device mapper.
I may have misunderstood this of course. Adding Christoph into the loop
so that he can clarify what he meant.
> qemu could look at the error
> code, but qemu doesn't know which path did the SG_IO go through - so
> it
> can't instruct qemu to fail that path.
>
> One of the possibility that we discussed was to add a path-id to
> SG_IO, so
> that dm-mpath would mark which path did the SG_IO go through and qemu
> could instruct dm-mpath to fail that path. But we rejected it as
> being
> more complicated than the current approach.
If we discuss extending SG_IO itself, it might be easier to have it
store the blkstatus_t somewhere in the sg_io_hdr. More about that idea
in my reply to Kevin.
Regards,
Martin
[1] https://lore.kernel.org/all/20210701075629.GA25768@lst.de/
> > Contrary to my set, you wouldn't attempt retries in the kernel, but
> > leave this part to qemu instead, like in the current set. That
> > would
> > avoid Christoph's main criticism that "Failing over SG_IO does not
> > make
> > sense" [2].
> >
> > Doing the failover in qemu has the general disadvantage that qemu
> > has
> > no notion about the number of available and/or healthy paths; it
> > can
> > thus only guess the reasonable number of retries for any given I/O.
> > But
> > that's unavoidable, given that the idea to do kernel-level failover
> > on
> > SG_IO is rejected.
> >
> > Please let me know your thoughts.
> >
> > Best Regards
> > Martin
> >
> > [1]
> > https://lore.kernel.org/all/20210628151558.2289-1-mwilck@suse.com/
> > [2] https://lore.kernel.org/all/20210701075629.GA25768@lst.de/
>
> Mikulas
Powered by blists - more mailing lists