lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9fa0fcae-a857-eca4-6aea-2213af62d1ef@amazon.com>
Date:   Wed, 14 Jun 2023 09:08:51 -0400
From:   Luiz Capitulino <luizcap@...zon.com>
To:     "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "Bhatnagar, Rishabh" <risbhat@...zon.com>
CC:     <bigeasy@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "sashal@...nel.org" <sashal@...nel.org>, <abuehaze@...zon.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees



On 2023-06-14 05:14, gregkh@...uxfoundation.org wrote:

> 
> 
> 
> On Tue, Jun 13, 2023 at 11:58:05AM -0700, Bhatnagar, Rishabh wrote:
>>
>> On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote:
>>> Hi Sebastian/Greg
>>>
>>> We are seeing RCU stall warnings from recent stable tree updates:
>>> 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
>>> This is seen in the upstream stable trees without any downstream patches.
>>>
>>> The issue is seen few minutes after booting without any workload.
>>> We launch hundred's of virtual instances and this shows up in 1-2
>>> instances, so its hard to reproduce.
>>> Attaching a few stack traces below.
>>>
>>> The issue can be seen on virtual and baremetal instances.
>>> Another interesting point is we only see this on x86 based instances.
>>> We also did test this on linux-mainline but were not able to reproduce
>>> the issue.
>>> So maybe there's a fixup or related commit that has gone in?
>>>
>>> We tried bisecting the stable trees and found that after reverting the
>>> below commit we couldn't reproduce this in any of the kernels
>>> consistently.
>>>
>>> tick/common: Align tick period with the HZ tick. [ Upstream commit
>>> e9523a0d81899361214d118ad60ef76f0e92f71d ]
>>>
>>>
>>> Not exactly sure how this commit is affecting all stable kernels.
>>> Can you take a look at this issue and share your insight?
> 
> Does this issue also show up in 6.3.y and in 6.4-rc5?

We haven't tried those yet, will try it today.

Just to give you a bit of context: we have a quick and a long duration
reproducer for this (which is our internal testing infrastructure).
In the quick reproducer we can more or less reliably reproduce with
5.4.246 and 5.10.183 but not with 5.15.113, 6.1.33 and latest Linus
tree (64569520920a3ca5d456ddd9f4f95fc6ea9b8b45). However, we did
reproduce something similar in the long reproducer with our downstream
versions of 5.15.113 and 6.1.33 (starting with 6.1.28).

- Luiz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ