lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5P5FNVjn9dq5AYL@eldamar.lan>
Date: Fri, 24 Jan 2025 21:33:24 +0100
From: Salvatore Bonaccorso <carnil@...ian.org>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Xan Charbonnet <xan@...rbonnet.com>, 1093243@...s.debian.org,
	Jens Axboe <axboe@...nel.dk>, Bernhard Schmidt <berni@...ian.org>,
	io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
	regressions@...ts.linux.dev
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

Hi Pavel,

On Fri, Jan 24, 2025 at 06:40:51PM +0000, Pavel Begunkov wrote:
> On 1/24/25 16:30, Xan Charbonnet wrote:
> > On 1/24/25 04:33, Pavel Begunkov wrote:
> > > Thanks for narrowing it down. Xan, can you try this change please?
> > > Waiters can miss wake ups without it, seems to match the description.
> > > 
> > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > index 9b58ba4616d40..e5a8ee944ef59 100644
> > > --- a/io_uring/io_uring.c
> > > +++ b/io_uring/io_uring.c
> > > @@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
> > >        io_commit_cqring(ctx);
> > >        spin_unlock(&ctx->completion_lock);
> > >        io_commit_cqring_flush(ctx);
> > > -    if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
> > > +    if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
> > > +        smp_mb();
> > >            __io_cqring_wake(ctx);
> > > +    }
> > >    }
> > >    void io_cq_unlock_post(struct io_ring_ctx *ctx)
> > > 
> > 
> > 
> > Thanks Pavel!  Early results look very good for this change.  I'm now running 6.1.120 with your added smp_mb() call.  The backup process which had been quickly triggering the issue has been running longer than it ever did when it would ultimately fail.  So that's great!
> > 
> > One sour note: overnight, replication hung on this machine, which is another failure that started happening with the jump from 6.1.119 to 6.1.123.  The machine was running 6.1.124 with the __io_cq_unlock_post_flush function removed completely.  That's the kernel we had celebrated yesterday for running the backup process successfully.
> > 
> > So, we might have two separate issues to deal with, unfortunately.
> 
> Possible, but it could also be a side effect of reverting the patch.
> As usual, in most cases patches are ported either because they're
> fixing sth or other fixes depend on it, and it's not yet apparent
> to me what happened with this one.

I researched bit the lists, and there was the inclusion request on the
stable list itself. Looking into the io-uring list I found
https://lore.kernel.org/io-uring/CADZouDRFJ9jtXHqkX-PTKeT=GxSwdMC42zEsAKR34psuG9tUMQ@mail.gmail.com/
which I think was the trigger to later on include in fact the commit
in 6.1.120. 

This just to give some datapoints on from where the request comes
initialy and for which problem it got tackled.

The following is just to make the regzbot aware of the regression:

#regzbot introduced: 3ab9326f93ec4471cab6f2107ecdf0cf6a8615aa
#regzbot link: https://bugs.debian.org/1093243

Regards,
Salvatore

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ