[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <47dd225b433b0df585a25084a2e793344eeda239.camel@suse.com>
Date: Thu, 08 May 2025 15:51:59 +0200
From: Martin Wilck <mwilck@...e.com>
To: Kevin Wolf <kwolf@...hat.com>, dm-devel@...ts.linux.dev
Cc: hreitz@...hat.com, mpatocka@...hat.com, snitzer@...nel.org,
bmarzins@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active
paths
Hello Kevin,
[I'm sorry for the previous email. It seems that I clicked "send" in
the wrong window].
On Tue, 2025-04-29 at 18:50 +0200, Kevin Wolf wrote:
> Multipath cannot directly provide failover for ioctls in the kernel
> because it doesn't know what each ioctl means and which result could
> indicate a path error. Userspace generally knows what the ioctl it
> issued means and if it might be a path error, but neither does it
> know
> which path the ioctl took nor does it necessarily have the privileges
> to
> fail a path using the control device.
Thanks for this effort.
I have some remarks about your approach. The most important one is that
the DM_MPATH_PROBE_PATHS_CMD ioctl appears to be a dangerous command.
It sends IO to possibly broken paths and waits for it to complete. In
the common error case of a device not responding, this might cause the
code to hang for a long time in the kernel ioctl code path, in
uninterruptible sleep (note that these probing commands will be queued
after other possibly hanging IO). In the past we have put a lot of
effort into other code paths in multipath-tools and elsewhere to avoid
this kind of hang to the extent possible. It seems to me that your set
re-introduces this risk.
Apart from that, minor observations are that in patch 2/2 you don't
check whether the map is in queueing state, and I don't quite
understand why you only check paths in the map's active path group
without attempting a PG failover in the case where all paths in the
current PG fail.
I am wondering whether the DM_MPATH_PROBE_PATHS_CMD ioctl is necessary
at all. Rather than triggering explicit path probing, you could have
dm-mpath "handle" IO errors by failing the active path for "path
errors". That would be similar to my patch set from 2021 [1], but it
would avoid the extra IO and thus the additional risk of hanging in the
kernel.
Contrary to my set, you wouldn't attempt retries in the kernel, but
leave this part to qemu instead, like in the current set. That would
avoid Christoph's main criticism that "Failing over SG_IO does not make
sense" [2].
Doing the failover in qemu has the general disadvantage that qemu has
no notion about the number of available and/or healthy paths; it can
thus only guess the reasonable number of retries for any given I/O. But
that's unavoidable, given that the idea to do kernel-level failover on
SG_IO is rejected.
Please let me know your thoughts.
Best Regards
Martin
[1] https://lore.kernel.org/all/20210628151558.2289-1-mwilck@suse.com/
[2] https://lore.kernel.org/all/20210701075629.GA25768@lst.de/
Powered by blists - more mailing lists