linux-ext4 - Re: Reducing ext4 fs issues resulting from frequent hard poweroffs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d63fc6fc-f848-4d78-b9d2-b7baf9f19467@sandeen.net>
Date:   Tue, 12 May 2020 22:16:00 -0500
From:   Eric Sandeen <sandeen@...deen.net>
To:     julio.lajara@...m.rpi.edu, linux-ext4@...r.kernel.org
Subject: Re: Reducing ext4 fs issues resulting from frequent hard poweroffs

On 5/12/20 4:08 PM, Julio Lajara wrote:
> Hi all, I currently manage an IOT fleet based on Intel NUCs running
> Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is
> more CPU bound than I/O bound and we are having some issues keeping a
> subset of devices running due to them being hard powered off in the
> field in some regions (sometimes as frequently as every 12hrs). Due to
> current difficulties in getting devices back from the field I'm
> looking into tweaking them as best as possible to survive these hard
> power off barring any physical SSD issues.

I don't think you've actually said what the failure mode after power
loss is, have you?

> Currently I have tried tweaking some ext4 and I/O settings with the following:
> 
> * kernel options:
>   elevator=noop fsck.mode=force fsck.repair=yes
> 
> * fstab ext4 specific mount options:
>   commit=1,max_batch_time=0
> 
> Are there any other configuration settings or changes to the above
> that would make sense to try here for this use case? I am hoping to at
> least make the fsck repair the last line of defence so it doesnt get
> stuck waiting for a prompt to repair it at boot, but want to try to
> change the I/O / ext4 behavior if possible so its writing as
> frequently as sanely possible to try to reduce the frequency where
> fsck is actually needed.

I can't tell from this why fsck is needed in the first place; what
actually goes wrong when power is lost?  Ted's right that properly
behaving hardware should not require any special attention after
power loss to restore filesystem consistency, but I can't tell for
sure what your actual root cause for boot failure is from this
email...

-Eric