linux-kernel - Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d4a5ead8-9e9d-b6ac-d0f2-7d46b4f5e4c2@alu.unizg.hr>
Date:   Tue, 20 Dec 2022 11:43:11 +0100
From:   Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To:     "Elliott, Robert (Servers)" <elliott@....com>,
        Phillip Lougher <phillip@...ashfs.org.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>
Cc:     "phillip.lougher@...il.com" <phillip.lougher@...il.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>
Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
 squashfs_readahead())

On 11/18/22 17:51, Elliott, Robert (Servers) wrote:
> 
> 
>> -----Original Message-----
>> From: Phillip Lougher <phillip@...ashfs.org.uk>
>> Sent: Friday, November 18, 2022 12:11 AM
>> To: Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>; LKML <linux-
>> kernel@...r.kernel.org>; Paul E. McKenney <paulmck@...nel.org>
>> Cc: phillip.lougher@...il.com; Thorsten Leemhuis
>> <regressions@...mhuis.info>
>> Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
>> squashfs_readahead())
>>
>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> While trying to bisect, I've found another bug that predated the
>>> introduction of squashfs_readahead(), but it has
>>> a common denominator in squashfs_decompress() and
>> squashfs_xz_uncompress().
>>
>> Wrong, the stall is happening in the XZ decompressor library, which
>> is *not* in Squashfs.
>>
>> This reported stall in the decompressor code is likely a symptom of you
>> deliberately thrashing your system.  When the system thrashes everything
>> starts to happen very slowly, and the system will spend a lot of
>> its time doing page I/O, and the CPU will spend a lot of time in
>> any CPU intensive code like the XZ decompressor library.
>>
>> So the fact the stall is being hit here is a symptom and not
>> a cause.  The decompressor code is likely running slowly due to
>> thrashing and waiting on paged-out buffers.  This is not indicative
>> of any bug, merely a system running slowly due to overload.
>>
>> As I said, this is not a Squashfs issue, because the code when the
>> stall takes place isn't in Squashfs.
>>
>> The people responsible for the rcu code should have a lot more insight
>> about what happens when the system is thrashing, and how this will
>> throw up false positives.  In this I believe this is an instance of
>> perfectly correct code running slowly due to thrashing incorrectly
>> being flagged as looping.
>>
>> CC'ing Paul E. McKenney <paulmck@...nel.org>
>>
>> Phillip
> 
> How big can these readahead sizes be? Should one of the loops include
> cond_resched() calls?

Please allow me to assert that 6.1.0+ kernel (this Berlin time 6 AM 
morning's build on on Torvalds' tree) built with CONFIG_KMEMLEAK=y, 
CONFIG_KASAN=y, CONFIG_LRU_GEN=y (multi-gen LRU) and
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 doesn't exhibit before seen RCU 
stalls even with such a low timeout as 20 ms.

So I guess kudos go to the MG-LRU developers, or has Mr. Lougher done 
something efficient in the meantime.

My $0.02!

Thank you,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia