lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufZeADNp_y=Ng+acmMMgnTR=ZGFZ7z-m6O47O=CmJauWjw@mail.gmail.com>
Date:   Wed, 6 Dec 2023 18:30:20 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Henry Huang <henry.hj@...group.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        谈鉴锋 <henry.tjf@...group.com>,
        朱辉(茶水) <teawater@...group.com>,
        akpm@...ux-foundation.org
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

On Wed, Dec 6, 2023 at 5:51 AM Henry Huang <henry.hj@...group.com> wrote:
>
> Multi-Gen LRU page-table walker clears pte young flag, but it doesn't
> clear page idle flag. When we use /sys/kernel/mm/page_idle/bitmap to check
> whether one page is accessed, it would tell us this page is idle,
> but actually this page has been accessed.
>
> For those unmapped filecache pages, page idle flag would not been
> cleared in folio_mark_accessed if Multi-Gen LRU is enabled.
> So we couln't use /sys/kernel/mm/page_idle/bitmap to check whether
> a filecache page is read or written.
>
> What's more, /sys/kernel/mm/page_idle/bitmap also clears pte young flag.
> If one page is accessed, it would set page young flag. Multi-Gen LRU
> page-table walker should check both page&pte young flags.
>
> how-to-reproduce-problem
>
> idle_page_track
>    a tools to track process accessed memory during a specific time
> usage
>    idle_page_track $pid $time
> how-it-works
>    1. scan process vma from /proc/$pid/maps
>    2. vfn --> pfn from /proc/$pid/pagemap
>    3. write /sys/kernel/mm/page_idle/bitmap to
>       mark phy page idle flag and clear pte young flag
>    4. sleep $time
>    5. read /sys/kernel/mm/page_idle/bitmap to
>       test_and_clear pte young flag and
>       return whether phy page is accessed
>
> test ---- test program
>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <unistd.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
>  #include <fcntl.h>
>
>  int main(int argc, const char *argv[])
>  {
>      char *buf = NULL;
>      char pipe_info[4096];
>      int n;
>      int fd = -1;
>
>      buf = malloc(1024*1024*1024UL);
>      memset(buf, 0, 1024*1024*1024UL);
>      fd = open("access.pipe", O_RDONLY);
>      if (fd < 0)
>          goto out;
>      while (1) {
>          n = read(fd, pipe_info, sizeof(pipe_info));
>          if (!n) {
>              sleep(1);
>              continue;
>          } else if (n < 0) {
>              break;
>          }
>          memset(buf, 0, 1024*1024*1024UL);
>          puts("finish access");
>       }
>  out:
>      if (fd >=0)
>          close(fd);
>      if (buf)
>          free(buf);
>
>      return 0;
>  }
>
> prepare:
> mkfifo access.pipe
> ./test
> ps -ef | grep test
> root       4106   3148  8 06:47 pts/0    00:00:01 ./test
>
> We use /sys/kernel/debug/lru_gen to simulate mglru page-table scan.
>
> case 1: mglru walker break page_idle
> ./idle_page_track 4106 60 &
> sleep 5; echo 1 > access.pipe
> sleep 5;  echo '+ 8 0 6 1 1' > /sys/kernel/debug/lru_gen
>
> the output of idle_page_track is:
> Est(s)     Ref(MB)
> 64.822        1.00
> only found 1MB were accessed during 64.822s, but actually 1024MB were
> accessed.
>
> case 2: page_idle break mglru walker
> echo 1 > access.pipe
> ./idle_page_track 4106 10
> echo '+ 8 0 7 1 1' > /sys/kernel/debug/lru_gen
> lru gen status:
> memcg     8     /user.slice
>  node     0
>           5     772458       1065        9735
>           6     737435     262244          72
>           7     538053       1184         632
>           8      59404       6422           0
> almost pages should be in max_seq-1 queue, but actually not.
>
> Signed-off-by: Henry Huang <henry.hj@...group.com>

It's never intended for MGLRU to support page_idle/bitmap or
PG_idle/young because:
1. page_idle/bitmap isn't a capable interface at all -- yes, Google
proposed the idea [1], but we don't really use it anymore because of
its poor scalability.
2. PG_idle/young, being a boolean value, has poor granularity. If
anyone must use page_idle/bitmap for some specific reason, I'd
recommend exporting generation numbers instead.

[1] https://lore.kernel.org/cover.1426706637.git.vdavydov@parallels.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ