lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCW95f8RGpLJZwSA@redhat.com>
Date: Thu, 15 May 2025 12:11:49 +0200
From: Kevin Wolf <kwolf@...hat.com>
To: Martin Wilck <mwilck@...e.com>
Cc: Christoph Hellwig <hch@...radead.org>,
	Benjamin Marzinski <bmarzins@...hat.com>, dm-devel@...ts.linux.dev,
	hreitz@...hat.com, mpatocka@...hat.com, snitzer@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active
 paths

Am 14.05.2025 um 23:21 hat Martin Wilck geschrieben:
> On Tue, 2025-05-13 at 10:00 +0200, Martin Wilck wrote:
> > > If you think it does, is there another reason why you didn't try
> > > this
> > > before?
> > 
> > It didn't occur to me back then that we could fail paths without
> > retrying in the kernel.
> > 
> > Perhaps we could have the sg driver pass the blk_status_t (which is
> > available on the sg level) to device mapper somehow in the sg_io_hdr
> > structure? That way we could entirely avoid the layering violation
> > between SCSI and dm. Not sure if that would be acceptible to
> > Christoph,
> > as blk_status_t is supposed to be exclusive to the kernel. Can we
> > find
> > a way to make sure it's passed to DM, but not to user space?
> 
> I have to correct myself. I was confused by my old patches which
> contain special casing for SG_IO. The current upstream code does of
> course not support special-casing SG_IO in any way. device-mapper
> neither looks at the ioctl `cmd` value nor at any arguments, and has
> only the Unix error code to examine when the ioctl returns. The device
> mapper layer has access to *less* information than the user space
> process that issued the ioctl. Adding hooks to the sg driver wouldn't
> buy us anything in this situation.
> 
> If we can't change this, we can't fail paths in the SG_IO error code
> path, end of story.

Yes, as long as we can't look at the sg_io_hdr, there is no way to
figure out if we got a path error.

> With Kevin's patch 1/2 applied, it would in principle be feasible to
> special-case SG_IO, handle it in the dm-multipath, retrieve the
> blk_status_t somehow, and possibly initiate path failover. This way
> we'd at least keep the generic dm layer clean of SCSI specific code.
> But still, the end result would look very similar attempt from 2021 and
> would therefore lead us nowhere, probably.

Right, that was my impression, too.

The interfaces could be made look a bit different, and we could return
-EAGAIN to userspace instead of retrying immediately (not that it makes
sense to me, but if that were really the issue, fine with me), but the
core logic with copying the sg_io_hdr, calling sg_io() directly and then
inspecting the status and possibly failing paths would have to be pretty
much the same as you had.

> I'm still not too fond of DM_MPATH_PROBE_PATHS_CMD, but I can't offer a
> better solution at this time. If the side issues are fixed, it will be
> an improvement over the current upstream, situation where we can do no
> path failover at all.

Yes, I agree we should focus on improving what we have, rather than
trying to find another radically different approach that none of us have
thought of before.

> In the long term, we should evaluate alternatives. If my conjecture in
> my previous post is correct we need only PRIN/PROUT commands, there
> might be a better solution than scsi-block for our customers. Using
> regular block IO should actually also improved performance.

If you're talking about SG_IO in dm-mpath, then PRIN/PROUT commands are
actually the one thing that we don't need. libmpathpersist sends the
commands to the individual path devices, so dm-mpath will never see
those. It's mostly about getting the full results on the SCSI level for
normal I/O commands.

There has actually been a patch series on qemu-devel last year (that I
haven't found the time to review properly yet) that would add explicit
persistent reservation operations to QEMU's block layer that could then
be used with the emulated scsi-hd device. On the backend, it only
implemented it for iscsi, but I suppose we could implement it for
file-posix, too (using the same libmpathpersist code as for
passthrough). If that works, maybe at least some users can move away
from SCSI passthrough.

The thing that we need to make sure, though, is that the emulated status
we can expose to the guest is actually good enough. That Paolo said that
the problem with reservation conflicts was mostly because -EBADE wasn't
a thing yet gives me some hope that at least this wouldn't be a problem
any more today.

We would still lose other parts of the SCSI status, so I'm still a bit
cautious here with making a prediction for how many users could
eventually (I expect years) use the emulated device instead and how many
would keep using passthrough even in the long term.

Kevin


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ