lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 21 Mar 2017 12:13:36 -0400
From:   Andreas Dilger <adilger@...ger.ca>
To:     Благодаренко Артём 
        <artem.blagodarenko@...il.com>
Cc:     linux-ext4 <linux-ext4@...r.kernel.org>,
        Alexey Lyashkov <alexey.lyashkov@...il.com>,
        Ts'o Theodore <tytso@....edu>
Subject: Re: 64-bit inode number

On Mar 20, 2017, at 6:22 AM, Благодаренко Артём <artem.blagodarenko@...il.com> wrote:
> 
> Hello,
> 
> This topic was mentioned in "Add largedir feature”, but need to be discussed in
> a separate thread.
> Increasing maximum inode count is useful with and without larg_dir. There is at
> least one user who needs 64 bit inode number - Lustre FS.
> 
> As mentioned, MDS has 0-size files to store some information about Lustre FS
> files. Current MDS disk sizes allow to store large amount of such files, but
> EXT4 limits this number to ~4 billions.
> 
> Lustre FS has features like DNE to distribute MDS over many targets (disks),
> but disks are used  not effectively. It would be great to have ability to
> store more then  ~4 billions inodes on one EXT4 file system.

I guess the major potential problem with more than 4B inodes in a single
filesystem (and also the large 300TB+ filesystems you are using) is that
running e2fsck could take a very long time.  Conversely, using DNE to spread
the metadata across multiple filesystems/servers allows e2fsck to run in
parallel and limits any failures to a smaller subset of the filesystem.

That doesn't mean I'm totally against this feature, since > 8TB disks are
becoming common.

> I know there is dirdata feature that allows to store higher 32 bit of inode
> number in ext4 dirent. As I know, direct was not merged yet because of user
> absence. Quote of Andreas from "Add largedir feature”
> 
>> Mostly because there hasn't been any interest for it whenever I proposed
>> merging it in the past. If there is some renewed interest in merging it
>> I could look into it …
> 
> It looks like Lustre FS requires this feature now.
> 
> There is another approach how to solve this problem. It is obvious, but
> require change on disk format. Theodore’s quote from "Add largedir feature”
> 
>> I can imagine a new feature flag which defines the use a 64-bit inode
>> number, but that's more for people who are creating a file system that
>> takes advantage of 64-bit block numbers, and they are intending on
>> using all of that space to store small (< 4k or < 8k) files.
> 
> This is exact Lustre FS MDS example. Many small inodes. If it possible to add new feature flag, probably this is the best solution: simple, obvious, fast.
> 
> Please, help with this questions:
> 1. Do we need 64 bit number now? (My opinion - we need it)
> 2. What solution from two above to choose? Another solution?

It wasn't clear from Ted's comments whether he was proposing the feature
flag to store 64-bit inode numbers directly into a new ext4_dir_entry64,
or to use the dir_data to hold the high 32 bits?  My preference would be to
store the high bits of the inode number into dir_data.  The reasons are:
- this won't use more space for 64-bit inodes than ext4_dir_entry64
- for 32-bit inode numbers will have smaller dirents
- significantly more 32-bit dirents can fit into a leaf block (i.e. 10-25%)
- it is backwards compatible with existing directories and can transparently
  store 64-bit inode numbers into 32-bit directories without a full update
- it avoids duplicate code paths for ext4_dir_entry vs ext4_dir_entry64
- it would be possible to only store high 16 bits (2^48 inodes) since this
  may be enough for ext4, since ext4_extent can only address 2^48 blocks
  (2^60 bytes) and there isn't much value to more inodes than blocks?

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ