linux-kernel - Re: [PATCH 1/3] mm, lru_gen: batch update counters on againg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7Bi19ou-c7rDZH+RMRMcV7Z49-xJh5KmFCfGy8XqCyREA@mail.gmail.com>
Date: Tue, 26 Dec 2023 02:05:53 +0800
From: Kairui Song <ryncsn@...il.com>
To: Yu Zhao <yuzhao@...gle.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] mm, lru_gen: batch update counters on againg

Yu Zhao <yuzhao@...gle.com> 于2023年12月25日周一 15:29写道：
>
> On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@...il.com> wrote:
> >
> > From: Kairui Song <kasong@...cent.com>
> >
> > When lru_gen is aging, it will update mm counters page by page,
> > which causes a higher overhead if age happens frequently or there
> > are a lot of pages in one generation getting moved.
> > Optimize this by doing the counter update in batch.
> >
> > Although most __mod_*_state has its own caches the overhead
> > is still observable.
> >
> > Tested in a 4G memcg on a EPYC 7K62 with:
> >
> >   memcached -u nobody -m 16384 -s /tmp/memcached.socket \
> >     -a 0766 -t 16 -B binary &
> >
> >   memtier_benchmark -S /tmp/memcached.socket \
> >     -P memcache_binary -n allkeys \
> >     --key-minimum=1 --key-maximum=16000000 -d 1024 \
> >     --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6
> >
> > Average result of 18 test runs:
> >
> > Before: 44017.78 Ops/sec
> > After:  44687.08 Ops/sec (+1.5%)
> >
> > Signed-off-by: Kairui Song <kasong@...cent.com>
> > ---
> >  mm/vmscan.c | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 55 insertions(+), 9 deletions(-)
>
> Usually most reclaim activity happens in kswapd, e.g., from the
> MongoDB benchmark (--duration=900):
> pgscan_kswapd 11294317
> pgscan_direct 128
> And kswapd always has current->reclaim_state->mm_walk. So the
> following should bring the vast majority of the improvement (assuming
> it's not noise) with far less code change:

Hi Yu,

This won't work for the fault path (eg. the memtier test):
Samples: 30K of event 'cycles', Event count (approx.): 69411674954
  Children      Self  Command          Shared Object               Symbol
-   85.95%     0.69%  memcached        [kernel.vmlinux]            [k]
asm_exc_page_fault
   - 85.25% asm_exc_page_fault
      - 85.00% exc_page_fault
         - 84.81% do_user_addr_fault
            - 84.01% handle_mm_fault
               - 83.70% __handle_mm_fault
                  - 82.57% do_swap_page
                     - 61.66% mem_cgroup_swapin_charge_folio
                        - 61.11% charge_memcg
                           - 60.76% try_charge_memcg
                              - 60.68% try_to_free_mem_cgroup_pages
                                   do_try_to_free_pages
                                 - shrink_node
                                    - 60.51% shrink_lruvec
                                       - 60.45% try_to_shrink_lruvec
                                          + 60.42% evict_folios
                     + 10.00% __swap_entry_free
                     + 3.81% swap_read_folio_bdev_sync
                     + 1.49% __pte_offset_map_lock
                     + 0.92% swap_cache_get_folio
                     + 0.80% folio_add_lru
                     + 0.75% vma_alloc_folio
                     + 0.60% swap_read_folio
                  + 0.73% do_anonymous_page
              0.54% lock_vma_under_rcu

And:
sudo cat /sys/kernel/debug/lru_gen_full | grep -A 25 benchmark
memcg    72 /benchmark
 node     0
        218       3283          1x          0x
                     0          0           0           0           0
         0           0
                     1          0           0           0           0
         0           0
                     2          0           0           0           0
         0           0
                     3          0           0           0           0
         0           0
                                0           0           0           0
         0           0
        219       2472       2756           0
                     0      14775r     303395e          0p          2r
         2e          0p
                     1          0r          0e          0p          0r
         0e          0p
                     2          0r          0e          0p          0r
         0e          0p
                     3          0r          0e      15262p          0r
         0e          0p
                                0           0           0           0
         0           0
        220       1652     456032          22
                     0          0           0           0           0
         0           0
                     1          0           0           0           0
         0           0
                     2          0           0           0           0
         0           0
                     3          0           0           0           0
         0           0
                                0           0           0           0
         0           0
        221        808     578570          13
                     0      15665R     309071T          0           0R
         1T          0
                     1          0R          0T          0           0R
         0T          0
                     2          0R          0T          0           0R
         0T          0
                     3          0R      15364T          0           0R
         0T          0
                          9191594L    3532525O    2425411Y      94393N
     18515F      10578A

It ages fast.

It's hard to share the code with mm_walk, because in next patch, it
tries to move the pages in bulk, there is no such logic for mm_walk.

It's not very effective with this benchmark indeed, I'll update with
some other tests.