lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6a7e68ff9e23aee448003ee1a279a4ab13192c0.camel@physik.fu-berlin.de>
Date: Mon, 06 Oct 2025 15:34:22 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Cc: Andreas Larsson <andreas@...sler.com>, Anthony Yznaga	
 <anthony.yznaga@...cle.com>, Sam James <sam@...too.org>, "David S . Miller"
	 <davem@...emloft.net>, Michael Karcher
 <kernel@...rcher.dialup.fu-berlin.de>, 	sparclinux@...r.kernel.org
Subject: Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when
 vio_ldc_send() returns EAGAIN"

Hi Jens,

On Mon, 2025-10-06 at 07:19 -0600, Jens Axboe wrote:
> > The problem is that giving up can lead to filesystem corruption which
> > is problem that was never observed before the change from what I know.
> 
> Yes, I get that.
> 
> > We have deployed a kernel with the change reverted on several LDOMs that
> > are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
> > lock ups or similar.
> 
> I believe you. I'm just wondering how long you generally need to spin,
> as per the question above: how many times does it generally spin where
> it would've failed before?

I don't have any data on that, unfortunately. All I can say that we have seen
filesystem corruption when installing Linux inside an LDOM and this particular
issue was eventually tracked down to this commit.

> > > Not that it's _really_ that important as this is a pretty niche driver,
> > > but still pretty ugly... Does it work reliably with a limit of 100
> > > spins? If things get truly stuck, spinning forever in that loop is not
> > > going to help. In any case, your patch should have
> > 
> > Isn't it possible that the call to vio_ldc_send() will eventually succeed
> > which is why there is no need to abort in __vdc_tx_trigger()?
> 
> Of course. Is it also possible that some conditions will lead it to
> never succeed?

I would assume that this would require for the virtual disk server to not
respond which should never happen since it's a virtualized environment.

If hardware issues would cause vio_ldc_send() to never succeed, these would
have to be handled by the virtualization environment, I guess.

> The nicer approach would be to have sunvdc punt retries back up to the
> block stack, which would at least avoid a busy spin for the condition.
> Rather than return BLK_STS_IOERR which terminates the request and
> bubbles back the -EIO as per your log. IOW, if we've already spent
> 10.5ms in that busy loop as per the above rough calculation, perhaps
> we'd be better off restarting the queue and hence this operation after a
> small sleeping delay instead. That would seem a lot saner than hammering
> on it.

I generally agree with this remark. I just wonder whether this behavior should
apply for a logical domain as well. I guess if a request doesn't succeed immediately,
it's an urgent problem if the logical domain locks up, is it?

I have to admit though that I'm absolutely not an expert on the block layer.

> > And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
> > loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
> > driver concerns a block device while the other driver concerns a network
> > device. Dropping network packages is expected, dropping bytes from a block
> > device driver is not.
> 
> Right, but we can sanely retry it rather than sit in a tight loop.
> Something like the entirely untested below diff.
> 
> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> index db1fe9772a4d..aa49dffb1b53 100644
> --- a/drivers/block/sunvdc.c
> +++ b/drivers/block/sunvdc.c
> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct vdc_port *port = hctx->queue->queuedata;
>  	struct vio_dring_state *dr;
>  	unsigned long flags;
> +	int ret;
>  
>  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>  
> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		return BLK_STS_DEV_RESOURCE;
>  	}
>  
> -	if (__send_request(bd->rq) < 0) {
> +	ret = __send_request(bd->rq);
> +	if (ret == -EAGAIN) {
> +		spin_unlock_irqrestore(&port->vio.lock, flags);
> +		/* already spun for 10msec, defer 10msec and retry */
> +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
> +		return BLK_STS_DEV_RESOURCE;
> +	} else if (ret < 0) {
>  		spin_unlock_irqrestore(&port->vio.lock, flags);
>  		return BLK_STS_IOERR;
>  	}

We could add this particular change on top of mine after we have extensively tested it.

For now, I would propose to pick up my patch to revert the previous change. I can then
pick up your proposed change and deploy it for extensive testing and see if it has any
side effects.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ