linux-ext4 - Reducing ext4 fs issues resulting from frequent hard poweroffs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPA0+ryNcZM7ch_beUHkj=s1_FOo7myV=OiY=4qNwoYeAg6FDg@mail.gmail.com>
Date:   Tue, 12 May 2020 17:08:51 -0400
From:   Julio Lajara <ju2wheels@...il.com>
To:     linux-ext4@...r.kernel.org
Subject: Reducing ext4 fs issues resulting from frequent hard poweroffs

Hi all, I currently manage an IOT fleet based on Intel NUCs running
Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is
more CPU bound than I/O bound and we are having some issues keeping a
subset of devices running due to them being hard powered off in the
field in some regions (sometimes as frequently as every 12hrs). Due to
current difficulties in getting devices back from the field I'm
looking into tweaking them as best as possible to survive these hard
power off barring any physical SSD issues.

Currently I have tried tweaking some ext4 and I/O settings with the following:

* kernel options:
  elevator=noop fsck.mode=force fsck.repair=yes

* fstab ext4 specific mount options:
  commit=1,max_batch_time=0

Are there any other configuration settings or changes to the above
that would make sense to try here for this use case? I am hoping to at
least make the fsck repair the last line of defence so it doesnt get
stuck waiting for a prompt to repair it at boot, but want to try to
change the I/O / ext4 behavior if possible so its writing as
frequently as sanely possible to try to reduce the frequency where
fsck is actually needed.

Thanks,