lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231215124423.88878-1-henry.hj@antgroup.com>
Date: Fri, 15 Dec 2023 20:44:17 +0800
From: "Henry Huang" <henry.hj@...group.com>
To: yuzhao@...gle.com
Cc:  <akpm@...ux-foundation.org>,
  "Henry Huang" <henry.hj@...group.com>,
  "谈鉴锋" <henry.tjf@...group.com>,
   <linux-kernel@...r.kernel.org>,
   <linux-mm@...ck.org>,
  "朱辉(茶水)" <teawater@...group.com>
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao <yuzhao@...gle.com> wrote:
>Regarding the change itself, it'd cause a slight regression to other
>use cases (details below).
>
> > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 unsigned long pfn;
> >                 struct folio *folio;
> >                 pte_t ptent = ptep_get(pte + i);
> > +               bool is_pte_young;
> >
> >                 total++;
> >                 walk->mm_stats[MM_LEAF_TOTAL]++;
> > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 if (pfn == -1)
> >                         continue;
> >
> > -               if (!pte_young(ptent)) {
> > -                       walk->mm_stats[MM_LEAF_OLD]++;
>
> Most overhead from page table scanning normally comes from
> get_pfn_folio() because it almost always causes a cache miss. This is
> like a pointer dereference, whereas scanning PTEs is like streaming an
> array (bad vs good cache performance).
>
> pte_young() is here to avoid an unnecessary cache miss from
> get_pfn_folio(). Also see the first comment in get_pfn_folio(). It
> should be easy to verify the regression -- FlameGraph from the
> memcached benchmark in the original commit message should do it.
>
> Would a tracepoint here work for you?
>
>
>
> > +               is_pte_young = !!pte_young(ptent);
> > +               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young);
> > +               if (!folio) {
> > +                       if (!is_pte_young)
> > +                               walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> >                 }
> >
> > -               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
> > -               if (!folio)
> > +               if (!folio_test_clear_young(folio) && !is_pte_young) {
> > +                       walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> > +               }
> >
> > -               if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
> > +               if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i))
> >                         VM_WARN_ON_ONCE(true);
> >
> >                 young++;

Thanks for replying.

For avoiding below:
1. confict between page_idle/bitmap and mglru scan
2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte.

We have a new idea:
1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without
rmap walking or clearing pte young.
2. Use mglru proactive scan to clear page idle flag.

workflows:
      t1                      t2 
mark pages idle    mglru scan and check pages idle

It's easy for us to know that whether a page is accessed during t1~t2.

Some code changes like these:

We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow
original design.

static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
					struct pglist_data *pgdat, bool can_swap, bool clear_idle)
{
	struct folio *folio;

	/* try to avoid unnecessary memory loads */
	if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
		return NULL;

	folio = pfn_folio(pfn);
+
+	if (clear_idle && folio_test_idle(folio))
+		folio_clear_idle(folio);
+
	if (folio_nid(folio) != pgdat->node_id)
		return NULL;

	if (folio_memcg_rcu(folio) != memcg)
		return NULL;

	/* file VMAs can contain anon pages from COW */
	if (!folio_is_file_lru(folio) && !can_swap)
		return NULL;

	return folio;
}

static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
			struct mm_walk *args)
{
...
	for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) {
		unsigned long pfn;
		struct folio *folio;
		pte_t ptent = ptep_get(pte + i);

		total++;
		walk->mm_stats[MM_LEAF_TOTAL]++;

		pfn = get_pte_pfn(ptent, args->vma, addr);
		if (pfn == -1)
			continue;

		if (!pte_young(ptent)) {
			walk->mm_stats[MM_LEAF_OLD]++;
			continue;
		}

+		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true);
-		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
		if (!folio)
			continue;
...
}

Is it a good idea or not ?

-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ