linux-ext4 - Re: ext2/zram issue [was: Linux 5.19]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9fd860a8-4e4f-6a95-5c3f-1b3c4a76cf51@kernel.org>
Date:   Tue, 9 Aug 2022 14:35:56 +0200
From:   Jiri Slaby <jirislaby@...nel.org>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        minchan@...nel.org, ngupta@...are.org, Jan Kara <jack@...e.com>,
        Ted Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        avromanov@...rdevices.ru, ddrokosov@...rdevices.ru
Subject: Re: ext2/zram issue [was: Linux 5.19]

On 09. 08. 22, 11:20, Sergey Senozhatsky wrote:
> On (22/08/09 18:11), Sergey Senozhatsky wrote:
>>>>> /me needs to confirm.
>>>>
>>>> With that commit reverted, I see no more I/O errors, only oom-killer
>>>> messages (which is OK IMO, provided I write 1G of urandom on a machine w/
>>>> 800M of RAM):
>>>
>>> Hmm... So handle allocation always succeeds in the slow path? (when we
>>> try to allocate it second time)
>>
>> Yeah I can see how handle re-allocation with direct reclaim can make it more
>> successful, but in exchange it oom-kills some user-space process, I suppose.
>> Is oom-kill really a good alternative though?
> 
> We likely will need to revert e7be8d1dd983 given that it has some
> user visible changes. But, honestly, failing zram write vs oom-kill
> a user-space is a tough choice.

Note that it OOMs only in my use case -- it's obviously too large zram 
on too low memory machine.

But the installer is different. It just creates memory pressure, yet, 
reclaim works well and is able to find memory and go on. I would say 
atomic vs non-atomic retry in the original (pre-5.19) approach makes the 
difference.

And yes, we should likely increase the memory in openQA to avoid too 
many reclaims...

PS the kernel finished building, now images are built, hence the new 
openQA run hasn't started yet. I will send the revert when it's complete 
and all green.

thanks,
-- 
js
suse labs