linux-kernel - Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z5KhDG86fvwzQ3VM@eldamar.lan>
Date: Thu, 23 Jan 2025 21:05:32 +0100
From: Salvatore Bonaccorso <carnil@...ian.org>
To: 1093243@...s.debian.org, Bernhard Schmidt <berni@...ian.org>
Cc: Jens Axboe <axboe@...nel.dk>, Pavel Begunkov <asml.silence@...il.com>,
	io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

Hi all,

On Wed, Jan 22, 2025 at 08:49:13PM +0100, Salvatore Bonaccorso wrote:
> Control: forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886
> Hi,
> 
> On Tue, Jan 21, 2025 at 08:06:18PM +0100, Bernhard Schmidt wrote:
> > Control: affects -1 src:mariadb
> > Control: tags -1 + confirmed
> > Control: severity -1 critical
> > 
> > Seeing this too. We have two standalone systems running the stock
> > bookworm MariaDB and the opensource network management system LibreNMS,
> > which is quite write-heavy. After some time (sometimes a couple of
> > hours, sometimes 1-2 days) all connection slots to the database are
> > full.
> > 
> > When you kill one client process you can connect and issue "show
> > processlist", you see all slots busy with easy update/select queries
> > that have been running for hours. You need to SIGKILL mariadbd to
> > recover.
> > 
> > The last two days our colleagues running a Galera cluster (unsure about
> > the version, inquiring) have been affected by this as well. They found
> > an mariadb bug report about this.
> > 
> > https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues
> > 
> > Since there have been reports about data loss I think it warrants
> > increasing the severity to critical.
> > 
> > I'm not 100% sure about -30 though, we have been downgrading the
> > production system to -28 and upgraded the test system to -30, and both
> > are working fine. The test system has less load though, and I trust the
> > reports here that -30 is still broken.
> 
> I would be interested to know if someone is able to reproduce the
> issue more in under lab conditions, which would enable us to bisect
> the issue.
> 
> As a start I set the above issue as a forward, to have the issues
> linked (and we later on can update it to the linux upstream report).

I suspect this might be introduced by one of the io_uring related
changes between 6.1.119 and 6.1.123. 

But we need to be able to trigger the issue in an environment not in
production, and then bisect those upstream changes. I'm still looping
in already Jens Axboe if this rings some bell.

Jens, for context, we have reports in Debian about MariaDB hangs after
updating from 6.1.119 based kernel to 6.1.123 (and 6.1.144) as
reported in https://bugs.debian.org/1093243

Regards,
Salvatore