lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3625058A-4B6D-4C28-B231-61DFED33F0B4@dilger.ca>
Date:   Fri, 23 Jun 2017 16:00:05 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Artem Blagodarenko <artem.blagodarenko@...gate.com>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] test-appliance: add new test configuration:
 ext4/lustre_mds

On Jun 23, 2017, at 3:41 PM, Theodore Ts'o <tytso@....edu> wrote:
> 
> On Fri, Jun 09, 2017 at 03:00:56PM -0600, Andreas Dilger wrote:
>> 
>> I didn't include large_dir because it isn't included in the upstream
>> kernels yet, and we haven't been using this in production due to lack
>> of e2fsck support (which Seagate has finally implemented, thank you).
> 
> Also, does Lustre use large_dir on the MDS server, or on the Lustre
> data server?  Because I noticed that on the MDS server you're
> apparently not using extents:
> 
> export EXT_MKFS_OPTIONS="-I 2048 -O ^64bit,mmp,uninit_bg,^extents,dir_nlink,...
>                                                          ^^^^^^^^
> Are you really using large_dir on a file system that is using indirect
> block mapped files?

Like I wrote above, we haven't been using large_dir in production yet.
My plan was only to use it on the MDT, which is where the filesystem
namespace is located.  Indeed, we don't use extents on the MDT because:
a) ext4 MDTs are strictly less than 16TB in size (usually < 8TB, even with
   4B inodes) because of using "-I 2048" to limit the space per inode,
   so they never need more than 2^32 blocks
b) the only files of any size on the MDT are directories or other log
   files, which are rarely created with contiguous block allocations,
   so extents usually take more space than indirect blocks (12 bytes vs 4).


I think Seagate was also considering it on their huge OSTs (which always
have extents enabled) since they may have large numbers of objects that
are currently referenced by a limited number of directories (32 currently).

I think the OST case would be better handled by creating multiple object
subdirectories[*], since the internal object directory layout is not
externally visible.  If the objects are grouped temporally into directories,
then they avoid consuming RAM when they are no longer actively in use, and
empty directories can be shrunk or deleted over time when they are empty.

This in turn would make it desirable to implement online directory shrinking
as has been discussed many times in the past, but even "e2fsck -fD" would
be able to shrink the old object directories offline as they become empty,
rather than a huge directory that has completely random leaf block access.

Cheers, Andreas

[*] based on the FID sequence number like with DNE




Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ