[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50beb356b4dc000446fd186ab754c87f386eaeae.camel@suse.com>
Date: Wed, 14 May 2025 19:37:19 +0200
From: Martin Wilck <mwilck@...e.com>
To: Benjamin Marzinski <bmarzins@...hat.com>, Christoph Hellwig
<hch@...radead.org>
Cc: Kevin Wolf <kwolf@...hat.com>, dm-devel@...ts.linux.dev,
hreitz@...hat.com, mpatocka@...hat.com, snitzer@...nel.org,
linux-kernel@...r.kernel.org, pbonzini@...hat.com, Hannes Reinecke
<hare@...e.com>
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active
paths
Hello Ben, hello Christoph,
On Wed, 2025-05-14 at 12:23 -0400, Benjamin Marzinski wrote:
> On Tue, May 13, 2025 at 09:57:51PM -0700, Christoph Hellwig wrote:
> >
> > SG_IO is fine and the only way for SCSI passthrough. But doing
> > SCSI passthrough through md-multipath just doesn't work. SCSI
> > isn't
> > built for layering, and ALUA and it's vendor-specific variants and
> > alternatives certainly isn't. If you try that you're playing with
> > fire and is not chance of ever moving properly.
>
> Could you be a bit more specific. All multipath is doing here is
> forwarding the ioctls to an underlying scsi device, and passing back
> up
> the result. Admittedly, it doesn't always make sense to pass the
> ioctl
> on from the multipath device to just one scsi device. Persistent
> Reservations are perfect example of this, and that's why QEMU doesn't
> use DMs ioctl passthrough code to handle them.
I'd go one step further. Christoph is right to say that what we're
currently doing in qemu – passing through every command except the
PRIN/PROUT to a multipath device – is a dangerous thing to do.
Passthrough from a dm-multipath device to a SCSI device makes sense
only for a small subset of the SCSI command set. Basically just for the
regular IO commands like the various READ and WRITE variants and the
occasional UNMAP. However, in practice these commands account for 99.y%
percent of the actual commands sent to devices. The fact that customers
have been running these setups in large deployments over many years
suggests that, if other commands ever get passed through to member
devices, it has rarely had fatal consequences.
Nobody would seriously consider sending ALUA commands to the multipath
devices. TUR and REQUEST SENSE are other examples for commands that
can't be reasonably passed through to random member devices of a
multipath map. There are certainly many more examples. I guess it would
make sense to review the command set and add some filtering in the qemu
passthrough code.
AFAIK the only commands that we really need to pass through (except the
standard ones) are the reservation commands, which get special handling
by qemu anyway. @Ben, @Kevin, are you aware of anything else?
So: admittedly we're using a framework for passing through any command,
where we actually need to pass through only a tiny subset of commands.
Thinking about it this way, it really doesn't look like the perfect
tool for the job, and we may want to look into a different approach for
the future.
> Also, when you have ALUA
> setups, not all the scsi devices are equal. But multipath isn't
> naievely
> assuming that they are. It's only passing ioctls to the highest
> priority
> activated paths, just like it does for IO, and multipath is in charge
> of
> handling explicit alua devices. This hasn't proved to be problematic
> in
> practice.
>
> The reality of the situation is that customers have been using this
> for
> a while, and the only issue that they run into is that multipath
> can't
> tell when a SG_IO has failed due to a retryable error. Currently,
> they're left with waiting for multipathd's preemptive path checking
> to
> fail the path so they can retry down a new one. The purpose of this
> patchset and Martin's previous one is to handle this problem. If
> there
> are unavoidable critical problems that you see with this setup, it
> would
> be really helpful to know what they are.
I'd also be interested in understanding this better. As noted above,
I'm aware that passing through everything is dangerous and wrong in
principle. But in practice, we haven't observed anything serious except
(as Ben already said) the failure to do path failover in the SG_IO code
path, which both this patch set and my set from the past are intended
to fix.
While I am open for looking for better alternatives, I still hope that
we can find an agreement for a short/mid-term solution that would allow
us to serve our customers who currently use SCSI passthrough setups.
That would not just benefit us (the enterprise distros), because it
would also help us fund upstream contributions.
Regards
Martin
Powered by blists - more mailing lists