lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue,  9 Apr 2024 20:26:26 +1200
From: Barry Song <21cnbao@...il.com>
To: akpm@...ux-foundation.org,
	linux-mm@...ck.org
Cc: baolin.wang@...ux.alibaba.com,
	chrisl@...nel.org,
	david@...hat.com,
	hanchuanhua@...o.com,
	hannes@...xchg.org,
	hughd@...gle.com,
	kasong@...cent.com,
	ryan.roberts@....com,
	surenb@...gle.com,
	v-songbaohua@...o.com,
	willy@...radead.org,
	xiang@...nel.org,
	ying.huang@...el.com,
	yosryahmed@...gle.com,
	yuzhao@...gle.com,
	ziy@...dia.com,
	linux-kernel@...r.kernel.org
Subject: [PATCH v2 0/5] large folios swap-in: handle refault cases first

From: Barry Song <v-songbaohua@...o.com>

This patch is extracted from the large folio swapin series[1], primarily addressing
the handling of scenarios involving large folios in the swap cache. Currently, it is
particularly focused on addressing the refaulting of mTHP, which is still undergoing
reclamation. This approach aims to streamline code review and expedite the integration
of this segment into the MM tree.

It relies on Ryan's swap-out series v7[2], leveraging the helper function
swap_pte_batch() introduced by that series.

Presently, do_swap_page only encounters a large folio in the swap
cache before the large folio is released by vmscan. However, the code
should remain equally useful once we support large folio swap-in via
swapin_readahead(). This approach can effectively reduce page faults
and eliminate most redundant checks and early exits for MTE restoration
in recent MTE patchset[3].

The large folio swap-in for SWP_SYNCHRONOUS_IO and swapin_readahead()
will be split into separate patch sets and sent at a later time.

-v2:
 - rebase on top of mm-unstable in which Ryan's swap_pte_batch() has changed
   a lot.
 - remove folio_add_new_anon_rmap() for !folio_test_anon()
   as currently large folios are always anon(refault).
 - add mTHP swpin refault counters

-v1:
  Link: https://lore.kernel.org/linux-mm/20240402073237.240995-1-21cnbao@gmail.com/

Differences with the original large folios swap-in series
 - collect r-o-b, acked;
 - rename swap_nr_free to swap_free_nr, according to Ryan;
 - limit the maximum kernel stack usage for swap_free_nr, Ryan;
 - add output argument in swap_pte_batch to expose if all entries are
   exclusive
 - many clean refinements, handle the corner case folio's virtual addr
   might not be naturally aligned

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/
[2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1-ryan.roberts@arm.com/
[3] https://lore.kernel.org/linux-mm/20240322114136.61386-1-21cnbao@gmail.com/

Barry Song (1):
  mm: swap_pte_batch: add an output argument to reture if all swap
    entries are exclusive
  mm: add per-order mTHP swpin_refault counter

Chuanhua Han (3):
  mm: swap: introduce swap_free_nr() for batched swap_free()
  mm: swap: make should_try_to_free_swap() support large-folio
  mm: swap: entirely map large folios found in swapcache

 include/linux/huge_mm.h |  1 +
 include/linux/swap.h    |  5 +++
 mm/huge_memory.c        |  2 ++
 mm/internal.h           |  9 +++++-
 mm/madvise.c            |  2 +-
 mm/memory.c             | 69 ++++++++++++++++++++++++++++++++---------
 mm/swapfile.c           | 51 ++++++++++++++++++++++++++++++
 7 files changed, 123 insertions(+), 16 deletions(-)

Appendix:

The following program can generate numerous instances where large folios
are hit in the swap cache if we enable 64KiB mTHP,

#echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled

#define DATA_SIZE (128UL * 1024)
#define PAGE_SIZE (4UL * 1024)
#define LARGE_FOLIO_SIZE (64UL * 1024)

static void *write_data(void *addr)
{
	unsigned long i;

	for (i = 0; i < DATA_SIZE; i += PAGE_SIZE)
		memset(addr + i, (char)i, PAGE_SIZE);
}

static void *read_data(void *addr)
{
	unsigned long i;

	for (i = 0; i < DATA_SIZE; i += PAGE_SIZE) {
		if (*((char *)addr + i) != (char)i) {
			perror("mismatched data");
			_exit(-1);
		}
	}
}

static void *pgout_data(void *addr)
{
	madvise(addr, DATA_SIZE, MADV_PAGEOUT);
}

int main(int argc, char **argv)
{
	for (int i = 0; i < 10000; i++) {
		pthread_t tid1, tid2;
		void *addr = mmap(NULL, DATA_SIZE * 2, PROT_READ | PROT_WRITE,
				MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
		unsigned long aligned_addr = ((unsigned long)addr + LARGE_FOLIO_SIZE) &
				~(LARGE_FOLIO_SIZE - 1);

		if (addr == MAP_FAILED) {
			perror("fail to malloc");
			return -1;
		}

		write_data(aligned_addr);

		if (pthread_create(&tid1, NULL, pgout_data, (void *)aligned_addr)) {
			perror("fail to pthread_create");
			return -1;
		}

		if (pthread_create(&tid2, NULL, read_data, (void *)aligned_addr)) {
			perror("fail to pthread_create");
			return -1;
		}

		pthread_join(tid1, NULL);
		pthread_join(tid2, NULL);
		munmap(addr, DATA_SIZE * 2);
	}

	return 0;
}

# cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/anon_swpout
932
# cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/anon_swpin_refault 
1488

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ