[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4883f83-9f5d-46b2-b30e-f2e78506bf30@marcinwanat.pl>
Date: Wed, 22 May 2024 12:13:12 +0200
From: Marcin Wanat <private@...cinwanat.pl>
To: Zhaoyang Huang <huangzhaoyang@...il.com>
Cc: Dave Chinner <david@...morbit.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"zhaoyang.huang" <zhaoyang.huang@...soc.com>, Alex Shi <alexs@...nel.org>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>, Baolin Wang
<baolin.wang@...ux.alibaba.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, steve.kang@...soc.com
Subject: Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock
during migration
On 22.05.2024 07:37, Zhaoyang Huang wrote:
> On Tue, May 21, 2024 at 11:47 PM Marcin Wanat <private@...cinwanat.pl> wrote:
>>
>> On 21.05.2024 03:00, Zhaoyang Huang wrote:
>>> On Tue, May 21, 2024 at 8:58 AM Zhaoyang Huang <huangzhaoyang@...il.com> wrote:
>>>>
>>>> On Tue, May 21, 2024 at 3:42 AM Marcin Wanat <private@...cinwanat.pl> wrote:
>>>>>
>>>>> On 15.04.2024 03:50, Zhaoyang Huang wrote:
>>>>> I have around 50 hosts handling high I/O (each with 20Gbps+ uplinks
>>>>> and multiple NVMe drives), running RockyLinux 8/9. The stock RHEL
>>>>> kernel 8/9 is NOT affected, and the long-term kernel 5.15.X is NOT affected.
>>>>> However, with long-term kernels 6.1.XX and 6.6.XX,
>>>>> (tested at least 10 different versions), this lockup always appears
>>>>> after 2-30 days, similar to the report in the original thread.
>>>>> The more load (for example, copying a lot of local files while
>>>>> serving 20Gbps traffic), the higher the chance that the bug will appear.
>>>>>
>>>>> I haven't been able to reproduce this during synthetic tests,
>>>>> but it always occurs in production on 6.1.X and 6.6.X within 2-30 days.
>>>>> If anyone can provide a patch, I can test it on multiple machines
>>>>> over the next few days.
>>>> Could you please try this one which could be applied on 6.6 directly. Thank you!
>>> URL: https://lore.kernel.org/linux-mm/20240412064353.133497-1-zhaoyang.huang@unisoc.com/
>>>
>>
>> Unfortunately, I am unable to cleanly apply this patch against the
>> latest 6.6.31
> Please try below one which works on my v6.6 based android. Thank you
> for your test in advance :D
>
> mm/huge_memory.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
I have compiled 6.6.31 with this patch and will test it on multiple
machines over the next 30 days. I will provide an update after 30 days
if everything is fine or sooner if any of the hosts experience the same
soft lockup again.
Powered by blists - more mailing lists