lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <175204AC-DBDD-4894-8944-BBD98388F547@dilger.ca>
Date:   Sat, 18 Mar 2017 23:38:38 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Alexey Lyashkov <alexey.lyashkov@...il.com>,
        Artem Blagodarenko <artem.blagodarenko@...il.com>,
        linux-ext4 <linux-ext4@...r.kernel.org>,
        Yang Sheng <yang.sheng@...el.com>,
        Zhen Liang <liang.zhen@...el.com>,
        Artem Blagodarenko <artem.blagodarenko@...gate.com>
Subject: Re: [PATCH] Add largedir feature

On Mar 18, 2017, at 10:29 AM, Theodore Ts'o <tytso@....edu> wrote:
> 
> On Sat, Mar 18, 2017 at 11:16:26AM +0300, Alexey Lyashkov wrote:
>> Andreas,
>> 
>> it not about a feature flag. It’s about a situation in whole.
>> Yes, we may increase a directory size, but it open a number a large problems.
> 
>> 1) readdir. It tries to read all entries in memory before send to
>> the user. currently it may eats 20*10^6 * 256 so several gigs, so
>> increasing it size may produce a problems for a system.
> 
> That's not true.  We normally only read in one block a time.  If there
> is a hash collision, then we may need to insert into the rbtree in a
> subsequent block's worth of dentries to make sure we have all of the
> directory entries corresponding to a particular hash value.  I think
> you misunderstood the code.
> 
>> 2) inode allocation. Current code tries to allocate an inode as near as possible to the directory inode, but one GD may hold 32k entries only, so increase a directory size will use up 1k GD for now and more than it, after it. It increase a seek time with file allocation. It was i mean when say - «dramatically decrease a file creation rate».
> 
> But there are also only 32k blocks in a group descriptor, and we try
> to keep the blocks allocated close to the inode.  So if you are using
> a huge directory, and you are using a storage device with a
> significant seek penalty, then yes, no matter what as the directory
> grows, the time to iterate over all of the files does grow.  But there
> is more to life than microbenchmarks which creates huge numbers of zero
> length files!  If we assume that the files are going to need to
> contain _data_, and the data blocks should be close to the inodes,
> then there are going to be some performance impacts no matter what.

Actually, on a Lustre MDT there _are_ only zero-length files, since all
of the data is stored in another filesystem.  Fortunately, the parent
directory stores the last group successfully used for allocation
(i_alloc_group) so that new inode allocation doesn't have to scan the
whole filesystem each time from the parent's group.

>> 3) current limit with 4G inodes - currently 32-128 directories may eats a full inode number space. From it perspective large dir don’t need to be used.
> 
> I can imagine a new feature flag which defines the use a 64-bit inode
> number, but that's more for people who are creating a file system that
> takes advantage of 64-bit block numbers, and they are intending on
> using all of that space to store small (< 4k or < 8k) files.

The 4-billion inode limit is somewhat independent of large directories.
That said, the DIRDATA feature that is used for Lustre is also designed
to allow storing the high 32 bits of the inode number in the directory.
This would allow compatible upgrade of a directory to storing both
32-bit and 64-bit inode numbers without the need for wholescale conversion
of directories, or having space for 64-bit inode numbers even if most
of the inodes are only 32-bit values.


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ