[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cef07e8f-a919-4aa1-9904-84b16dfa8fe6@kernel.dk>
Date: Mon, 6 Oct 2025 07:46:21 -0600
From: Jens Axboe <axboe@...nel.dk>
To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Andreas Larsson <andreas@...sler.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>, Sam James <sam@...too.org>,
"David S . Miller" <davem@...emloft.net>,
Michael Karcher <kernel@...rcher.dialup.fu-berlin.de>,
sparclinux@...r.kernel.org
Subject: Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when
vio_ldc_send() returns EAGAIN"
On 10/6/25 7:34 AM, John Paul Adrian Glaubitz wrote:
> Hi Jens,
>
> On Mon, 2025-10-06 at 07:19 -0600, Jens Axboe wrote:
>>> The problem is that giving up can lead to filesystem corruption which
>>> is problem that was never observed before the change from what I know.
>>
>> Yes, I get that.
>>
>>> We have deployed a kernel with the change reverted on several LDOMs that
>>> are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
>>> lock ups or similar.
>>
>> I believe you. I'm just wondering how long you generally need to spin,
>> as per the question above: how many times does it generally spin where
>> it would've failed before?
>
> I don't have any data on that, unfortunately. All I can say that we
> have seen filesystem corruption when installing Linux inside an LDOM
> and this particular issue was eventually tracked down to this commit.
Right, I would imagine that would need to be collected separately. But
if we can get away from the busy looping, then I don't think it's too
interesting.
>>>> Not that it's _really_ that important as this is a pretty niche driver,
>>>> but still pretty ugly... Does it work reliably with a limit of 100
>>>> spins? If things get truly stuck, spinning forever in that loop is not
>>>> going to help. In any case, your patch should have
>>>
>>> Isn't it possible that the call to vio_ldc_send() will eventually succeed
>>> which is why there is no need to abort in __vdc_tx_trigger()?
>>
>> Of course. Is it also possible that some conditions will lead it to
>> never succeed?
>
> I would assume that this would require for the virtual disk server to
> not respond which should never happen since it's a virtualized
> environment.
Most likely, yes.
> If hardware issues would cause vio_ldc_send() to never succeed, these
> would have to be handled by the virtualization environment, I guess.
Yep, it'd be a bug somewhere else.
>> The nicer approach would be to have sunvdc punt retries back up to the
>> block stack, which would at least avoid a busy spin for the condition.
>> Rather than return BLK_STS_IOERR which terminates the request and
>> bubbles back the -EIO as per your log. IOW, if we've already spent
>> 10.5ms in that busy loop as per the above rough calculation, perhaps
>> we'd be better off restarting the queue and hence this operation after a
>> small sleeping delay instead. That would seem a lot saner than hammering
>> on it.
>
> I generally agree with this remark. I just wonder whether this
> behavior should apply for a logical domain as well. I guess if a
> request doesn't succeed immediately, it's an urgent problem if the
> logical domain locks up, is it?
It's just bad behavior. Honestly most of this just looks like either a
bad implementation of the protocol as it's all based on busy looping, or
a badly designed protocol. And then the sunvdc usage of it just
proliferates that problem, rather than utilizing the tools that exist in
the block stack to take a breather rather than repeatedly hammering on
the hardware for conditions like this.
>>> And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
>>> loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
>>> driver concerns a block device while the other driver concerns a network
>>> device. Dropping network packages is expected, dropping bytes from a block
>>> device driver is not.
>>
>> Right, but we can sanely retry it rather than sit in a tight loop.
>> Something like the entirely untested below diff.
>>
>> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
>> index db1fe9772a4d..aa49dffb1b53 100644
>> --- a/drivers/block/sunvdc.c
>> +++ b/drivers/block/sunvdc.c
>> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>> struct vdc_port *port = hctx->queue->queuedata;
>> struct vio_dring_state *dr;
>> unsigned long flags;
>> + int ret;
>>
>> dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>>
>> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>> return BLK_STS_DEV_RESOURCE;
>> }
>>
>> - if (__send_request(bd->rq) < 0) {
>> + ret = __send_request(bd->rq);
>> + if (ret == -EAGAIN) {
>> + spin_unlock_irqrestore(&port->vio.lock, flags);
>> + /* already spun for 10msec, defer 10msec and retry */
>> + blk_mq_delay_kick_requeue_list(hctx->queue, 10);
>> + return BLK_STS_DEV_RESOURCE;
>> + } else if (ret < 0) {
>> spin_unlock_irqrestore(&port->vio.lock, flags);
>> return BLK_STS_IOERR;
>> }
>
> We could add this particular change on top of mine after we have
> extensively tested it.
It doesn't really make sense on top of yours, as that removes the
limited looping that sunvdc would do...
> For now, I would propose to pick up my patch to revert the previous
> change. I can then pick up your proposed change and deploy it for
> extensive testing and see if it has any side effects.
Why not just test this one and see if it works? As far as I can tell,
it's been 6.5 years since this change went in, I can't imagine there's a
huge sense of urgency to fix it up that can't wait for testing a more
proper patch rather than a work-around?
--
Jens Axboe
Powered by blists - more mailing lists