linux-kernel - Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <79efe5a3-00d4-4242-aee9-6cb2ccc56090@kernel.dk>
Date: Thu, 23 Jan 2025 13:26:27 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Salvatore Bonaccorso <carnil@...ian.org>, 1093243@...s.debian.org,
 Bernhard Schmidt <berni@...ian.org>
Cc: Pavel Begunkov <asml.silence@...il.com>, io-uring@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs

On 1/23/25 1:05 PM, Salvatore Bonaccorso wrote:
> Hi all,
> 
> On Wed, Jan 22, 2025 at 08:49:13PM +0100, Salvatore Bonaccorso wrote:
>> Control: forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886
>> Hi,
>>
>> On Tue, Jan 21, 2025 at 08:06:18PM +0100, Bernhard Schmidt wrote:
>>> Control: affects -1 src:mariadb
>>> Control: tags -1 + confirmed
>>> Control: severity -1 critical
>>>
>>> Seeing this too. We have two standalone systems running the stock
>>> bookworm MariaDB and the opensource network management system LibreNMS,
>>> which is quite write-heavy. After some time (sometimes a couple of
>>> hours, sometimes 1-2 days) all connection slots to the database are
>>> full.
>>>
>>> When you kill one client process you can connect and issue "show
>>> processlist", you see all slots busy with easy update/select queries
>>> that have been running for hours. You need to SIGKILL mariadbd to
>>> recover.
>>>
>>> The last two days our colleagues running a Galera cluster (unsure about
>>> the version, inquiring) have been affected by this as well. They found
>>> an mariadb bug report about this.
>>>
>>> https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues
>>>
>>> Since there have been reports about data loss I think it warrants
>>> increasing the severity to critical.
>>>
>>> I'm not 100% sure about -30 though, we have been downgrading the
>>> production system to -28 and upgraded the test system to -30, and both
>>> are working fine. The test system has less load though, and I trust the
>>> reports here that -30 is still broken.
>>
>> I would be interested to know if someone is able to reproduce the
>> issue more in under lab conditions, which would enable us to bisect
>> the issue.
>>
>> As a start I set the above issue as a forward, to have the issues
>> linked (and we later on can update it to the linux upstream report).
> 
> I suspect this might be introduced by one of the io_uring related
> changes between 6.1.119 and 6.1.123. 
> 
> But we need to be able to trigger the issue in an environment not in
> production, and then bisect those upstream changes. I'm still looping
> in already Jens Axboe if this rings some bell.
> 
> Jens, for context, we have reports in Debian about MariaDB hangs after
> updating from 6.1.119 based kernel to 6.1.123 (and 6.1.144) as
> reported in https://bugs.debian.org/1093243

Thanks for the report, that's certainly unexpected. I'll take a look.

-- 
Jens Axboe