lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a29ad9ab-15c2-4788-a839-009ca6fdd00f@gmail.com>
Date: Fri, 24 Jan 2025 10:33:30 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Salvatore Bonaccorso <carnil@...ian.org>
Cc: Xan Charbonnet <xan@...rbonnet.com>, 1093243@...s.debian.org,
 Jens Axboe <axboe@...nel.dk>, Bernhard Schmidt <berni@...ian.org>,
 io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

On 1/24/25 05:24, Salvatore Bonaccorso wrote:
> HI Pavel, hi Jens,
> 
> On Thu, Jan 23, 2025 at 11:20:40PM +0000, Pavel Begunkov wrote:
>> On 1/23/25 20:49, Salvatore Bonaccorso wrote:
>>> Hi Xan,
>>>
>>> On Thu, Jan 23, 2025 at 02:31:34PM -0600, Xan Charbonnet wrote:
>>>> I rented a Linode and have been trying to load it down with sysbench
>>>> activity while doing a mariabackup and a mysqldump, also while spinning up
>>>> the CPU with zstd benchmarks.  So far I've had no luck triggering the fault.
>>>>
>>>> I've also been doing some kernel compilation.  I followed this guide:
>>>> https://www.dwarmstrong.org/kernel/
>>>> (except that I used make -j24 to build in parallel and used make
>>>> localmodconfig to compile only the modules I need)
>>>>
>>>> I've built the following kernels:
>>>> 6.1.123 (equivalent to linux-image-6.1.0-29-amd64)
>>>> 6.1.122
>>>> 6.1.121
>>>> 6.1.120
>>>>
>>>> So far they have all exhibited the behavior.  Next up is 6.1.119 which is
>>>> equivalent to linux-image-6.1.0-28-amd64.  My expectation is that the fault
>>>> will not appear for this kernel.
>>>>
>>>> It looks like the issue is here somewhere:
>>>> https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.120
>>>>
>>>> I have to work on some other things, and it'll take a while to prove the
>>>> negative (that is, to know that the failure isn't happening).  I'll post
>>>> back with the 6.1.119 results when I have them.
>>>
>>> Additionally please try with 6.1.120 and revert this commit
>>>
>>> 3ab9326f93ec ("io_uring: wake up optimisations")
>>>
>>> (which landed in 6.1.120).
>>>
>>> If that solves the problem maybe we miss some prequisites in the 6.1.y
>>> series here?
>>
>> I'm not sure why the commit was backported (need to look it up),
>> but from a quick look it does seem to miss a barrier present in
>> the original patch.
> 
> Ack, this was here for reference:
> https://lore.kernel.org/stable/57b048be-31d4-4380-8296-56afc886299a@kernel.dk/
> 
> Xan Charbonnet was able to confirm in https://bugs.debian.org/1093243#99 that
> indeed reverting the commit fixes the mariadb related hangs.

Thanks for narrowing it down. Xan, can you try this change please?
Waiters can miss wake ups without it, seems to match the description.

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9b58ba4616d40..e5a8ee944ef59 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
  	io_commit_cqring(ctx);
  	spin_unlock(&ctx->completion_lock);
  	io_commit_cqring_flush(ctx);
-	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
+		smp_mb();
  		__io_cqring_wake(ctx);
+	}
  }
  
  void io_cq_unlock_post(struct io_ring_ctx *ctx)

-- 
Pavel Begunkov


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ