linux-kernel - Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active paths

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1767e6d3757f824039b4f799157d262a8dd74ced.camel@suse.com>
Date: Tue, 13 May 2025 09:06:31 +0200
From: Martin Wilck <mwilck@...e.com>
To: Mikulas Patocka <mpatocka@...hat.com>, Christoph Hellwig
 <hch@...radead.org>
Cc: Kevin Wolf <kwolf@...hat.com>, dm-devel@...ts.linux.dev,
 hreitz@...hat.com, 	snitzer@...nel.org, bmarzins@...hat.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active
 paths

Hello Mikulas,

On Mon, 2025-05-12 at 15:46 +0200, Mikulas Patocka wrote:
> Hi
> 
> 
> On Thu, 8 May 2025, Martin Wilck wrote:
> 
> > Hello Kevin,
> > 
> > [I'm sorry for the previous email. It seems that I clicked "send"
> > in
> > the wrong window].
> > 
> > On Tue, 2025-04-29 at 18:50 +0200, Kevin Wolf wrote:
> > > Multipath cannot directly provide failover for ioctls in the
> > > kernel
> > > because it doesn't know what each ioctl means and which result
> > > could
> > > indicate a path error. Userspace generally knows what the ioctl
> > > it
> > > issued means and if it might be a path error, but neither does it
> > > know
> > > which path the ioctl took nor does it necessarily have the
> > > privileges
> > > to
> > > fail a path using the control device.
> > 
> > Thanks for this effort.
> > 
> > I have some remarks about your approach. The most important one is
> > that
> > the DM_MPATH_PROBE_PATHS_CMD ioctl appears to be a dangerous
> > command.
> > It sends IO to possibly broken paths and waits for it to complete.
> > In
> > the common error case of a device not responding, this might cause
> > the
> > code to hang for a long time in the kernel ioctl code path, in
> > uninterruptible sleep (note that these probing commands will be
> > queued
> 
> Normal reads and writes would also hang in an uninterruptible sleep
> if a 
> path stops responding.

Right. That's why multipathd uses TEST UNIT READY if supported by the
storage, and either aio or separate I/O threads to avoid the main
thread being blocked, and generally goes to great lengths to make sure
that misbehaving storage hardware can cause no freeze.

The way I read Kevin's code, you'd queue up additional IO in the same
context that had submitted the original failing IO. I realize that qemu
uses asynchronous IO, but AFAIK the details depend on the qemu options
for the individual VM, and it isn't clear to me whether or not
DM_MPATH_PROBE_PATHS_CMD can bring the entire VM to halt.

> >  [...]
> > 
> > I am wondering whether the DM_MPATH_PROBE_PATHS_CMD ioctl is
> > necessary
> > at all. Rather than triggering explicit path probing, you could
> > have
> > dm-mpath "handle" IO errors by failing the active path for "path
> > errors". That would be similar to my patch set from 2021 [1], but
> > it
> > would avoid the extra IO and thus the additional risk of hanging in
> > the
> > kernel.
> 
> What exactly do you mean? You say "you could have dm-mpath 'handle'
> IO 
> errors by failing the active path for "path errors".

What I meant was that dm-mpath could call fail_path() in case of an
error, using a similar mechanism to the block IO code path.

> Christoph doesn't want dm-mpath can't look at the error code - so dm-
> mpath 
> doesn't know if path error occured or not. 

My impression from the previous discussion was that Christoph mainly
objected to SG_IO requests being retried in the kernel [1], not so much
to the inspection of the error code by device mapper.

I may have misunderstood this of course. Adding Christoph into the loop
so that he can clarify what he meant. 

> qemu could look at the error 
> code, but qemu doesn't know which path did the SG_IO go through - so
> it 
> can't instruct qemu to fail that path.
> 
> One of the possibility that we discussed was to add a path-id to
> SG_IO, so 
> that dm-mpath would mark which path did the SG_IO go through and qemu
> could instruct dm-mpath to fail that path. But we rejected it as
> being 
> more complicated than the current approach.

If we discuss extending SG_IO itself, it might be easier to have it
store the blkstatus_t somewhere in the sg_io_hdr. More about that idea
in my reply to Kevin.

Regards,
Martin

[1] https://lore.kernel.org/all/20210701075629.GA25768@lst.de/


> > Contrary to my set, you wouldn't attempt retries in the kernel, but
> > leave this part to qemu instead, like in the current set. That
> > would
> > avoid Christoph's main criticism that "Failing over SG_IO does not
> > make
> > sense" [2].
> > 
> > Doing the failover in qemu has the general disadvantage that qemu
> > has
> > no notion about the number of available and/or healthy paths; it
> > can
> > thus only guess the reasonable number of retries for any given I/O.
> > But
> > that's unavoidable, given that the idea to do kernel-level failover
> > on
> > SG_IO is rejected.
> > 
> > Please let me know your thoughts.
> > 
> > Best Regards
> > Martin
> > 
> > [1]
> > https://lore.kernel.org/all/20210628151558.2289-1-mwilck@suse.com/
> > [2] https://lore.kernel.org/all/20210701075629.GA25768@lst.de/
> 
> Mikulas