lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5KhDG86fvwzQ3VM@eldamar.lan>
Date: Thu, 23 Jan 2025 21:05:32 +0100
From: Salvatore Bonaccorso <carnil@...ian.org>
To: 1093243@...s.debian.org, Bernhard Schmidt <berni@...ian.org>
Cc: Jens Axboe <axboe@...nel.dk>, Pavel Begunkov <asml.silence@...il.com>,
	io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

Hi all,

On Wed, Jan 22, 2025 at 08:49:13PM +0100, Salvatore Bonaccorso wrote:
> Control: forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886
> Hi,
> 
> On Tue, Jan 21, 2025 at 08:06:18PM +0100, Bernhard Schmidt wrote:
> > Control: affects -1 src:mariadb
> > Control: tags -1 + confirmed
> > Control: severity -1 critical
> > 
> > Seeing this too. We have two standalone systems running the stock
> > bookworm MariaDB and the opensource network management system LibreNMS,
> > which is quite write-heavy. After some time (sometimes a couple of
> > hours, sometimes 1-2 days) all connection slots to the database are
> > full.
> > 
> > When you kill one client process you can connect and issue "show
> > processlist", you see all slots busy with easy update/select queries
> > that have been running for hours. You need to SIGKILL mariadbd to
> > recover.
> > 
> > The last two days our colleagues running a Galera cluster (unsure about
> > the version, inquiring) have been affected by this as well. They found
> > an mariadb bug report about this.
> > 
> > https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues
> > 
> > Since there have been reports about data loss I think it warrants
> > increasing the severity to critical.
> > 
> > I'm not 100% sure about -30 though, we have been downgrading the
> > production system to -28 and upgraded the test system to -30, and both
> > are working fine. The test system has less load though, and I trust the
> > reports here that -30 is still broken.
> 
> I would be interested to know if someone is able to reproduce the
> issue more in under lab conditions, which would enable us to bisect
> the issue.
> 
> As a start I set the above issue as a forward, to have the issues
> linked (and we later on can update it to the linux upstream report).

I suspect this might be introduced by one of the io_uring related
changes between 6.1.119 and 6.1.123. 

But we need to be able to trigger the issue in an environment not in
production, and then bisect those upstream changes. I'm still looping
in already Jens Axboe if this rings some bell.

Jens, for context, we have reports in Debian about MariaDB hangs after
updating from 6.1.119 based kernel to 6.1.123 (and 6.1.144) as
reported in https://bugs.debian.org/1093243

Regards,
Salvatore

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ