lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4x-p+8SzyHQq_EJpbq+hSEu5MCtwpGWvafpk4xfpB1gKg@mail.gmail.com>
Date: Sun, 25 Feb 2024 03:50:48 +0800
From: Barry Song <21cnbao@...il.com>
To: SeongJae Park <sj@...nel.org>
Cc: akpm@...ux-foundation.org, damon@...ts.linux.dev, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, minchan@...nel.org, mhocko@...e.com, 
	hannes@...xchg.org, Barry Song <v-songbaohua@...o.com>
Subject: Re: [PATCH RFC] mm: madvise: pageout: ignore references rather than
 clearing young

On Sun, Feb 25, 2024 at 3:02 AM SeongJae Park <sj@...nel.org> wrote:
>
> On Fri, 23 Feb 2024 17:15:50 +1300 Barry Song <21cnbao@...il.com> wrote:
>
> > From: Barry Song <v-songbaohua@...o.com>
> >
> > While doing MADV_PAGEOUT, the current code will clear PTE young
> > so that vmscan won't read young flags to allow the reclamation
> > of madvised folios to go ahead.
> > It seems we can do it by directly ignoring references, thus we
> > can remove tlb flush in madvise and rmap overhead in vmscan.
> >
> > Regarding the side effect, in the original code, if a parallel
> > thread runs side by side to access the madvised memory with the
> > thread doing madvise, folios will get a chance to be re-activated
> > by vmscan. But with the patch, they will still be reclaimed. But
> > this behaviour doing PAGEOUT and doing access at the same time is
> > quite silly like DoS. So probably, we don't need to care.
>
> I think we might need to take care of the case, since users may use just a
> best-effort estimation like DAMON for the target pages.  In such cases, the
> page granularity re-check of the access could be helpful.  So I concern if this
> could be a visible behavioral change for some valid use cases.

Hi SeongJae,

If you read the code of MADV_PAGEOUT,  you will find it is not the best-effort.
It does clearing pte  young and immediately after the ptes are cleared, it reads
pte and checks if the ptes are young. If not, reclaim it. So the
purpose of clearing
PTE young is helping the check of young in folio_references to return false.
The gap between clearing ptes and re-checking ptes is quite small at
microseconds
level.

>
> >
> > A microbench as below has shown 6% decrement on the latency of
> > MADV_PAGEOUT,
>
> I assume some of the users may use MADV_PAGEOUT for proactive reclamation of
> the memory.  In the use case, I think latency of MADV_PAGEOUT might be not that
> important.
>
> Hence I think the cons of the behavioral change might outweigh the pros of the
> latench improvement, for such best-effort proactive reclamation use case.  Hope
> to hear and learn from others' opinions.

I don't see  the behavioral change for MADV_PAGEOUT as just the ping-pong
is removed. The only chance is in that very small time gap, somebody accesses
the cleared ptes and makes it young again, considering this time gap
is so small,
i don't think it is worth caring.  thus, i don't see pros for MADV_PAGEOUT case,
but we improve the efficiency of MADV_PAGEOUT and save the power of
Android phones.

>
> >
> >  #define PGSIZE 4096
> >  main()
> >  {
> >       int i;
> >  #define SIZE 512*1024*1024
> >       volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> >                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >
> >       for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long))
> >               p[i] =  0x11;
> >
> >       madvise(p, SIZE, MADV_PAGEOUT);
> >  }
> >
> > w/o patch                    w/ patch
> > root@10:~# time ./a.out      root@10:~# time ./a.out
> > real  0m49.634s            real   0m46.334s
> > user  0m0.637s             user   0m0.648s
> > sys   0m47.434s            sys    0m44.265s
> >
> > Signed-off-by: Barry Song <v-songbaohua@...o.com>
>
>

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ