lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e45e3182c4718cafad1166e9ef8dcca1c301651.camel@physik.fu-berlin.de>
Date: Mon, 06 Oct 2025 15:00:10 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Cc: Andreas Larsson <andreas@...sler.com>, Anthony Yznaga	
 <anthony.yznaga@...cle.com>, Sam James <sam@...too.org>, "David S . Miller"
	 <davem@...emloft.net>, Michael Karcher
 <kernel@...rcher.dialup.fu-berlin.de>, 	sparclinux@...r.kernel.org
Subject: Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when
 vio_ldc_send() returns EAGAIN"

Hi Jens,

On Mon, 2025-10-06 at 06:48 -0600, Jens Axboe wrote:
> When you apply this patch and things work, how many times does it
> generally spin where it would've failed before? It's a bit unnerving to
> have a never ending spin loop for this, with udelay spinning in between
> as well. Looking at vio_ldc_send() as well, that spins for potentially
> 1000 loops of 1usec each, which would be 1ms. With the current limit of
> 10 retries, the driver would end up doing udelays of:
> 
> 1
> 2
> 4
> 8
> 16
> 32
> 64
> 128
> 128
> 128
> 
> which is 511 usec on top, for 10.5ms in total spinning time before
> giving up. That is kind of mind boggling, that's an eternity.

The problem is that giving up can lead to filesystem corruption which
is problem that was never observed before the change from what I know.

We have deployed a kernel with the change reverted on several LDOMs that
are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
lock ups or similar.

> Not that it's _really_ that important as this is a pretty niche driver,
> but still pretty ugly... Does it work reliably with a limit of 100
> spins? If things get truly stuck, spinning forever in that loop is not
> going to help. In any case, your patch should have

Isn't it possible that the call to vio_ldc_send() will eventually succeed
which is why there is no need to abort in __vdc_tx_trigger()?

And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
driver concerns a block device while the other driver concerns a network
device. Dropping network packages is expected, dropping bytes from a block
device driver is not.

> Cc: stable@...r.kernel.org
> Fixes: a11f6ca9aef9 ("sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN")
> 
> tags added.

Will do.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ