lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cef07e8f-a919-4aa1-9904-84b16dfa8fe6@kernel.dk>
Date: Mon, 6 Oct 2025 07:46:21 -0600
From: Jens Axboe <axboe@...nel.dk>
To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Andreas Larsson <andreas@...sler.com>,
 Anthony Yznaga <anthony.yznaga@...cle.com>, Sam James <sam@...too.org>,
 "David S . Miller" <davem@...emloft.net>,
 Michael Karcher <kernel@...rcher.dialup.fu-berlin.de>,
 sparclinux@...r.kernel.org
Subject: Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when
 vio_ldc_send() returns EAGAIN"

On 10/6/25 7:34 AM, John Paul Adrian Glaubitz wrote:
> Hi Jens,
> 
> On Mon, 2025-10-06 at 07:19 -0600, Jens Axboe wrote:
>>> The problem is that giving up can lead to filesystem corruption which
>>> is problem that was never observed before the change from what I know.
>>
>> Yes, I get that.
>>
>>> We have deployed a kernel with the change reverted on several LDOMs that
>>> are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
>>> lock ups or similar.
>>
>> I believe you. I'm just wondering how long you generally need to spin,
>> as per the question above: how many times does it generally spin where
>> it would've failed before?
> 
> I don't have any data on that, unfortunately. All I can say that we
> have seen filesystem corruption when installing Linux inside an LDOM
> and this particular issue was eventually tracked down to this commit.

Right, I would imagine that would need to be collected separately. But
if we can get away from the busy looping, then I don't think it's too
interesting.

>>>> Not that it's _really_ that important as this is a pretty niche driver,
>>>> but still pretty ugly... Does it work reliably with a limit of 100
>>>> spins? If things get truly stuck, spinning forever in that loop is not
>>>> going to help. In any case, your patch should have
>>>
>>> Isn't it possible that the call to vio_ldc_send() will eventually succeed
>>> which is why there is no need to abort in __vdc_tx_trigger()?
>>
>> Of course. Is it also possible that some conditions will lead it to
>> never succeed?
> 
> I would assume that this would require for the virtual disk server to
> not respond which should never happen since it's a virtualized
> environment.

Most likely, yes.

> If hardware issues would cause vio_ldc_send() to never succeed, these
> would have to be handled by the virtualization environment, I guess.

Yep, it'd be a bug somewhere else.

>> The nicer approach would be to have sunvdc punt retries back up to the
>> block stack, which would at least avoid a busy spin for the condition.
>> Rather than return BLK_STS_IOERR which terminates the request and
>> bubbles back the -EIO as per your log. IOW, if we've already spent
>> 10.5ms in that busy loop as per the above rough calculation, perhaps
>> we'd be better off restarting the queue and hence this operation after a
>> small sleeping delay instead. That would seem a lot saner than hammering
>> on it.
> 
> I generally agree with this remark. I just wonder whether this
> behavior should apply for a logical domain as well. I guess if a
> request doesn't succeed immediately, it's an urgent problem if the
> logical domain locks up, is it?

It's just bad behavior. Honestly most of this just looks like either a
bad implementation of the protocol as it's all based on busy looping, or
a badly designed protocol. And then the sunvdc usage of it just
proliferates that problem, rather than utilizing the tools that exist in
the block stack to take a breather rather than repeatedly hammering on
the hardware for conditions like this.

>>> And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
>>> loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
>>> driver concerns a block device while the other driver concerns a network
>>> device. Dropping network packages is expected, dropping bytes from a block
>>> device driver is not.
>>
>> Right, but we can sanely retry it rather than sit in a tight loop.
>> Something like the entirely untested below diff.
>>
>> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
>> index db1fe9772a4d..aa49dffb1b53 100644
>> --- a/drivers/block/sunvdc.c
>> +++ b/drivers/block/sunvdc.c
>> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  	struct vdc_port *port = hctx->queue->queuedata;
>>  	struct vio_dring_state *dr;
>>  	unsigned long flags;
>> +	int ret;
>>  
>>  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>>  
>> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  		return BLK_STS_DEV_RESOURCE;
>>  	}
>>  
>> -	if (__send_request(bd->rq) < 0) {
>> +	ret = __send_request(bd->rq);
>> +	if (ret == -EAGAIN) {
>> +		spin_unlock_irqrestore(&port->vio.lock, flags);
>> +		/* already spun for 10msec, defer 10msec and retry */
>> +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
>> +		return BLK_STS_DEV_RESOURCE;
>> +	} else if (ret < 0) {
>>  		spin_unlock_irqrestore(&port->vio.lock, flags);
>>  		return BLK_STS_IOERR;
>>  	}
> 
> We could add this particular change on top of mine after we have
> extensively tested it.

It doesn't really make sense on top of yours, as that removes the
limited looping that sunvdc would do...

> For now, I would propose to pick up my patch to revert the previous
> change. I can then pick up your proposed change and deploy it for
> extensive testing and see if it has any side effects.

Why not just test this one and see if it works? As far as I can tell,
it's been 6.5 years since this change went in, I can't imagine there's a
huge sense of urgency to fix it up that can't wait for testing a more
proper patch rather than a work-around?

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ