[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <A32DD67A-F005-42BF-B243-A56208421EBA@dilger.ca>
Date: Tue, 21 Mar 2017 12:13:36 -0400
From: Andreas Dilger <adilger@...ger.ca>
To: Благодаренко Артём
<artem.blagodarenko@...il.com>
Cc: linux-ext4 <linux-ext4@...r.kernel.org>,
Alexey Lyashkov <alexey.lyashkov@...il.com>,
Ts'o Theodore <tytso@....edu>
Subject: Re: 64-bit inode number
On Mar 20, 2017, at 6:22 AM, Благодаренко Артём <artem.blagodarenko@...il.com> wrote:
>
> Hello,
>
> This topic was mentioned in "Add largedir feature”, but need to be discussed in
> a separate thread.
> Increasing maximum inode count is useful with and without larg_dir. There is at
> least one user who needs 64 bit inode number - Lustre FS.
>
> As mentioned, MDS has 0-size files to store some information about Lustre FS
> files. Current MDS disk sizes allow to store large amount of such files, but
> EXT4 limits this number to ~4 billions.
>
> Lustre FS has features like DNE to distribute MDS over many targets (disks),
> but disks are used not effectively. It would be great to have ability to
> store more then ~4 billions inodes on one EXT4 file system.
I guess the major potential problem with more than 4B inodes in a single
filesystem (and also the large 300TB+ filesystems you are using) is that
running e2fsck could take a very long time. Conversely, using DNE to spread
the metadata across multiple filesystems/servers allows e2fsck to run in
parallel and limits any failures to a smaller subset of the filesystem.
That doesn't mean I'm totally against this feature, since > 8TB disks are
becoming common.
> I know there is dirdata feature that allows to store higher 32 bit of inode
> number in ext4 dirent. As I know, direct was not merged yet because of user
> absence. Quote of Andreas from "Add largedir feature”
>
>> Mostly because there hasn't been any interest for it whenever I proposed
>> merging it in the past. If there is some renewed interest in merging it
>> I could look into it …
>
> It looks like Lustre FS requires this feature now.
>
> There is another approach how to solve this problem. It is obvious, but
> require change on disk format. Theodore’s quote from "Add largedir feature”
>
>> I can imagine a new feature flag which defines the use a 64-bit inode
>> number, but that's more for people who are creating a file system that
>> takes advantage of 64-bit block numbers, and they are intending on
>> using all of that space to store small (< 4k or < 8k) files.
>
> This is exact Lustre FS MDS example. Many small inodes. If it possible to add new feature flag, probably this is the best solution: simple, obvious, fast.
>
> Please, help with this questions:
> 1. Do we need 64 bit number now? (My opinion - we need it)
> 2. What solution from two above to choose? Another solution?
It wasn't clear from Ted's comments whether he was proposing the feature
flag to store 64-bit inode numbers directly into a new ext4_dir_entry64,
or to use the dir_data to hold the high 32 bits? My preference would be to
store the high bits of the inode number into dir_data. The reasons are:
- this won't use more space for 64-bit inodes than ext4_dir_entry64
- for 32-bit inode numbers will have smaller dirents
- significantly more 32-bit dirents can fit into a leaf block (i.e. 10-25%)
- it is backwards compatible with existing directories and can transparently
store 64-bit inode numbers into 32-bit directories without a full update
- it avoids duplicate code paths for ext4_dir_entry vs ext4_dir_entry64
- it would be possible to only store high 16 bits (2^48 inodes) since this
may be enough for ext4, since ext4_extent can only address 2^48 blocks
(2^60 bytes) and there isn't much value to more inodes than blocks?
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)
Powered by blists - more mailing lists