[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d4a5ead8-9e9d-b6ac-d0f2-7d46b4f5e4c2@alu.unizg.hr>
Date: Tue, 20 Dec 2022 11:43:11 +0100
From: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To: "Elliott, Robert (Servers)" <elliott@....com>,
Phillip Lougher <phillip@...ashfs.org.uk>,
LKML <linux-kernel@...r.kernel.org>,
"Paul E. McKenney" <paulmck@...nel.org>
Cc: "phillip.lougher@...il.com" <phillip.lougher@...il.com>,
Thorsten Leemhuis <regressions@...mhuis.info>
Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
squashfs_readahead())
On 11/18/22 17:51, Elliott, Robert (Servers) wrote:
>
>
>> -----Original Message-----
>> From: Phillip Lougher <phillip@...ashfs.org.uk>
>> Sent: Friday, November 18, 2022 12:11 AM
>> To: Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>; LKML <linux-
>> kernel@...r.kernel.org>; Paul E. McKenney <paulmck@...nel.org>
>> Cc: phillip.lougher@...il.com; Thorsten Leemhuis
>> <regressions@...mhuis.info>
>> Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
>> squashfs_readahead())
>>
>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> While trying to bisect, I've found another bug that predated the
>>> introduction of squashfs_readahead(), but it has
>>> a common denominator in squashfs_decompress() and
>> squashfs_xz_uncompress().
>>
>> Wrong, the stall is happening in the XZ decompressor library, which
>> is *not* in Squashfs.
>>
>> This reported stall in the decompressor code is likely a symptom of you
>> deliberately thrashing your system. When the system thrashes everything
>> starts to happen very slowly, and the system will spend a lot of
>> its time doing page I/O, and the CPU will spend a lot of time in
>> any CPU intensive code like the XZ decompressor library.
>>
>> So the fact the stall is being hit here is a symptom and not
>> a cause. The decompressor code is likely running slowly due to
>> thrashing and waiting on paged-out buffers. This is not indicative
>> of any bug, merely a system running slowly due to overload.
>>
>> As I said, this is not a Squashfs issue, because the code when the
>> stall takes place isn't in Squashfs.
>>
>> The people responsible for the rcu code should have a lot more insight
>> about what happens when the system is thrashing, and how this will
>> throw up false positives. In this I believe this is an instance of
>> perfectly correct code running slowly due to thrashing incorrectly
>> being flagged as looping.
>>
>> CC'ing Paul E. McKenney <paulmck@...nel.org>
>>
>> Phillip
>
> How big can these readahead sizes be? Should one of the loops include
> cond_resched() calls?
Please allow me to assert that 6.1.0+ kernel (this Berlin time 6 AM
morning's build on on Torvalds' tree) built with CONFIG_KMEMLEAK=y,
CONFIG_KASAN=y, CONFIG_LRU_GEN=y (multi-gen LRU) and
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 doesn't exhibit before seen RCU
stalls even with such a low timeout as 20 ms.
So I guess kudos go to the MG-LRU developers, or has Mr. Lougher done
something efficient in the meantime.
My $0.02!
Thank you,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
Powered by blists - more mailing lists