lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 May 2020 18:01:11 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     julio.lajara@...m.rpi.edu
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Reducing ext4 fs issues resulting from frequent hard poweroffs

On Tue, May 12, 2020 at 05:08:51PM -0400, Julio Lajara wrote:
> Hi all, I currently manage an IOT fleet based on Intel NUCs running
> Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is
> more CPU bound than I/O bound and we are having some issues keeping a
> subset of devices running due to them being hard powered off in the
> field in some regions (sometimes as frequently as every 12hrs). Due to
> current difficulties in getting devices back from the field I'm
> looking into tweaking them as best as possible to survive these hard
> power off barring any physical SSD issues.

Hi Julio,

If the hardware devices are behaving appropriately --- that is, after
receiving a CACHE FLUSH command the storage device persists all blocks
written up to the CACHE FLUSH command, such that when the OS receives
the command completion notification of the CACHE FLUSH, everything is
persisted even after a hard power off --- no special configuration
should be necessary.

We have regression tests which simulate this and ext4 regularly passes
them.

If you need to tweak settings, that's an indication that your hardware
is buggy.  And unfortunately ,there's not much we can do to prevent
failures.  A lot is going to depend on *how* crappy the SSD's happen
to be.

Your best bet might be to find a way to make your root filesystem
read-only, so it's not being modified at all, and then set up a
scratch partition with state which can be reformatted at any time if
it gets corrupted --- and then try to get all of your date pushed out
to your remote servers / cloud as often as possible.  And next time,
qualify the SSD's ahead of time to make sure they aren't overly "cost
optimized" (read: crap) before you buy your fleet of devices.  :-(

	   	  	       	       - Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ