lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160122141140.GD2948@linux.intel.com>
Date:	Fri, 22 Jan 2016 09:11:40 -0500
From:	Matthew Wilcox <willy@...ux.intel.com>
To:	mingming cao <mingming.cao@...cle.com>
Cc:	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-nvdimm@...1.01.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v3 0/8] Support for transparent PUD pages for DAX files

On Thu, Jan 21, 2016 at 02:48:58PM -0800, mingming cao wrote:
> On 01/08/2016 11:49 AM, Matthew Wilcox wrote:
> > Filesystems still need work to allocate 1GB pages.  With ext4, I can
> > only get 16MB of contiguous space, although it is aligned.  With XFS,
> > I can get 80MB less than 1GB, and it's not aligned.  The XFS problem
> > may be due to the small amount of RAM in my test machine.
> 
> I dont think ext4 can do 1G at this time due to extent length bits
> (15 for unwritten) and block group size bundary (well, with flex bg we
> may able to relax this ). I have seen about 125M of contiguous space
> allocated on my fresh new ext4 filesystem. I do remember mballoc in ext4
> used to normalize the allocation request up to 8 or 16M, but it appears
> not that small any more.

I agree that the on-disk ext4 format can't represent a single 1GB
extent (ext4_extent's ee_len is 16 bits), but the in-memory extent tree
(extent_status's es_len) uses a 32-bit block count field, which can
represent an 8TB length extent with 4kB blocks.

It seems that at the moment, something is constraining allocations to be
at most 16MB, so that we can convert one extent_status to one ext4_extent.
What I'd like to see is code to convert one extent_status into multiple
ext4_extents on disc, and recombine multiple ext4_extents into a single
extent_status when the inode is read back in later.

Then we can start looking at places where ext4 puts metadata in the
middle of 1GB regions, preventing them from being used ... that'll be
a separate bag of issues, no doubt.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ