lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <joisx5udw4tebjykvcs2s75qxzkugr2rlyvngzmml5xhm7jnvu@o4nvt7g735oj>
Date: Mon, 22 Jul 2024 09:35:11 +0000
From: Daniel Gomez <da.gomez@...sung.com>
To: Ryan Roberts <ryan.roberts@....com>
CC: David Hildenbrand <david@...hat.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Hugh Dickins <hughd@...gle.com>, Jonathan
	Corbet <corbet@....net>, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
	Barry Song <baohua@...nel.org>, Lance Yang <ioworker0@...il.com>, Baolin
	Wang <baolin.wang@...ux.alibaba.com>, Gavin Shan <gshan@...hat.com>, Pankaj
	Raghav <kernel@...kajraghav.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache
 memory

On Wed, Jul 17, 2024 at 11:45:48AM GMT, Ryan Roberts wrote:
> On 17/07/2024 11:31, David Hildenbrand wrote:
> > On 17.07.24 09:12, Ryan Roberts wrote:
> >> Hi All,
> >>
> >> This series is an RFC that adds sysfs and kernel cmdline controls to configure
> >> the set of allowed large folio sizes that can be used when allocating
> >> file-memory for the page cache. As part of the control mechanism, it provides
> >> for a special-case "preferred folio size for executable mappings" marker.
> >>
> >> I'm trying to solve 2 separate problems with this series:
> >>
> >> 1. Reduce pressure in iTLB and improve performance on arm64: This is a modified
> >> approach for the change at [1]. Instead of hardcoding the preferred executable
> >> folio size into the arch, user space can now select it. This decouples the arch
> >> code and also makes the mechanism more generic; it can be bypassed (the default)
> >> or any folio size can be set. For my use case, 64K is preferred, but I've also
> >> heard from Willy of a use case where putting all text into 2M PMD-sized folios
> >> is preferred. This approach avoids the need for synchonous MADV_COLLAPSE (and
> >> therefore faulting in all text ahead of time) to achieve that.
> >>
> >> 2. Reduce memory fragmentation in systems under high memory pressure (e.g.
> >> Android): The theory goes that if all folios are 64K, then failure to allocate a
> >> 64K folio should become unlikely. But if the page cache is allocating lots of
> >> different orders, with most allocations having an order below 64K (as is the
> >> case today) then ability to allocate 64K folios diminishes. By providing control
> >> over the allowed set of folio sizes, we can tune to avoid crucial 64K folio
> >> allocation failure. Additionally I've heard (second hand) of the need to disable
> >> large folios in the page cache entirely due to latency concerns in some
> >> settings. These controls allow all of this without kernel changes.
> >>
> >> The value of (1) is clear and the performance improvements are documented in
> >> patch 2. I don't yet have any data demonstrating the theory for (2) since I
> >> can't reproduce the setup that Barry had at [2]. But my view is that by adding
> >> these controls we will enable the community to explore further, in the same way
> >> that the anon mTHP controls helped harden the understanding for anonymous
> >> memory.
> >>
> >> ---
> > 
> > How would this interact with other requirements we get from the filesystem (for
> > example, because of the device) [1].
> > 
> > Assuming a device has a filesystem has a min order of X, but we disable anything
> >>= X, how would we combine that configuration/information?
> 
> Currently order-0 is implicitly the "always-on" fallback order. My thinking was
> that with [1], the specified min order just becomes that "always-on" fallback order.
> 
> Today:
> 
>   orders = file_orders_always() | BIT(0);
> 
> Tomorrow:
> 
>   orders = (file_orders_always() & ~(BIT(min_order) - 1)) | BIT(min_order);
> 
> That does mean that in this case, a user-disabled order could still be used. So
> the controls are really hints rather than definitive commands.

In the scenario where a min order is not enabled in hugepages-<size>kB/
file_enabled, will the user still be allowed to automatically mkfs/mount with
blocksize=min_order, and will sysfs reflect this? Or, since it's a hint, will it
remain hidden but still allow mkfs/mount to proceed?

> 
> 
> > 
> > 
> > [1]
> > https://lore.kernel.org/all/20240715094457.452836-2-kernel@pankajraghav.com/T/#u
> > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ