lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Feb 2024 12:17:52 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>,
	Mark Rutland <mark.rutland@....com>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Muchun Song <muchun.song@...ux.dev>
Cc: Ryan Roberts <ryan.roberts@....com>,
	linux-arm-kernel@...ts.infradead.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

This is an RFC for a series that aims to reduce the cost and complexity of
ptep_get_lockless() for arm64 when supporting transparent contpte mappings [1].
The approach came from discussion with Mark and David [2].

It introduces a new helper, ptep_get_lockless_norecency(), which allows the
access and dirty bits in the returned pte to be incorrect. This relaxation
permits arm64's implementation to just read the single target pte, and avoids
having to iterate over the full contpte block to gather the access and dirty
bits, for the contpte case.

It turns out that none of the call sites using ptep_get_lockless() require
accurate access and dirty bit information, so we can also convert those sites.
Although a couple of places need care (see patches 2 and 3).

Arguably patch 3 is a bit fragile, given the wide accessibility of
vmf->orig_pte. So it might make sense to drop this patch and stick to using
ptep_get_lockless() in the page fault path. I'm keen to hear opinions.

I've chosen the name "recency" because it's shortish and somewhat descriptive,
and is alredy used in a couple of places to mean similar things (see mglru and
damon). I'm open to other names if anyone has better ideas.

If concensus is that this approach is generally acceptable, I intend to create a
series in future to do a similar thing with ptep_get() -> ptep_get_norecency().

---
This series applies on top of [1].

[1] https://lore.kernel.org/linux-mm/20240215103205.2607016-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/linux-mm/a91cfe1c-289e-4828-8cfc-be34eb69a71b@redhat.com/

Thanks,
Ryan

Ryan Roberts (4):
  mm: Introduce ptep_get_lockless_norecency()
  mm/gup: Use ptep_get_lockless_norecency()
  mm/memory: Use ptep_get_lockless_norecency() for orig_pte
  arm64/mm: Override ptep_get_lockless_norecency()

 arch/arm64/include/asm/pgtable.h |  6 ++++
 include/linux/pgtable.h          | 55 ++++++++++++++++++++++++++++--
 kernel/events/core.c             |  2 +-
 mm/gup.c                         |  7 ++--
 mm/hugetlb.c                     |  2 +-
 mm/khugepaged.c                  |  2 +-
 mm/memory.c                      | 57 ++++++++++++++++++++------------
 mm/swap_state.c                  |  2 +-
 mm/swapfile.c                    |  2 +-
 9 files changed, 102 insertions(+), 33 deletions(-)

--
2.25.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ