linux-kernel - Re: [memcg] a8c49af3be: hackbench.throughput -13.7% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkaDWtgy3Ckhgo+vcbM7oYA8saPCVRvXLDKYHvRGYkKvoQ@mail.gmail.com>
Date:   Tue, 26 Apr 2022 21:34:12 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Yin Fengwei <fengwei.yin@...el.com>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, Ying <ying.huang@...el.com>, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com
Subject: Re: [memcg] a8c49af3be: hackbench.throughput -13.7% regression

Hi Yin,

On Tue, Apr 26, 2022 at 7:53 PM Yin Fengwei <fengwei.yin@...el.com> wrote:
>
> Hi Yosry,
>
> On 4/20/2022 1:58 PM, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a -13.7% regression of hackbench.throughput due to commit:
> >
> >
> > commit: a8c49af3be5f0b4e105ef678bcf14ef102c270be ("memcg: add per-memcg total kernel memory stat")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: hackbench
> > on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> > with following parameters:
> >
> >       nr_threads: 100%
> >       iterations: 4
> >       mode: process
> >       ipc: socket
> >       cpufreq_governor: performance
> >       ucode: 0x7002402
> >
> > test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> > test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > To reproduce:
> >
> >         git clone https://github.com/intel/lkp-tests.git
> >         cd lkp-tests
> >         sudo bin/lkp install job.yaml           # job file is attached in this email
> >         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >         sudo bin/lkp run generated-yaml-file
> >
> >         # if come across any failure that blocks the test,
> >         # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
> >   gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/hackbench/0x7002402
> >
> > commit:
> >   086f694a75 ("memcg: replace in_interrupt() with !in_task()")
> >   a8c49af3be ("memcg: add per-memcg total kernel memory stat")
> >
> > 086f694a75e1a283 a8c49af3be5f0b4e105ef678bcf
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >     146519           -13.7%     126397        hackbench.throughput
> >     465.89           +16.0%     540.43        hackbench.time.elapsed_time
> >     465.89           +16.0%     540.43        hackbench.time.elapsed_time.max
> >  1.365e+08          +134.1%  3.195e+08 ±  4%  hackbench.time.involuntary_context_switches
> >    1081515            -1.2%    1068489        hackbench.time.minor_page_faults
> >      64911           +16.3%      75465        hackbench.time.system_time
>
> Just FYI.
>
> If I comment out one line added by the commit <a8c49af3be> :
> static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
> {
>         /* mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); */  <--- comment out this line.

Thanks so much for looking into this. Unfortunately this line is
essentially the commit core, all the other changes are more or less
refactoring or adding interfaces to read this stat through.

I am not sure why this specific callsite causes regression, there are
many callsites that modify stats similarly (whether through
mod_memcg_state() directly or through other variants like
mod_lruvec_state()). Maybe the kmem call path is exercised more
heavily in this benchmark than other call paths that update stats.

The only seemingly expensive operation in the mod_memcg_state() path
is the call to cgroup_rstat_updated() (through memcg_rstat_updated()).
One idea off the top of my head is to batch calls to
cgroup_rstat_updated(), similar to what 11192d9c124d ("memcg: flush
stats only if updated") did on the flush side. I am interested to see
what memcg maintainers think about this problem (and the proposed
solution).

>         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
>                 if (nr_pages > 0)
>                         page_counter_charge(&memcg->kmem, nr_pages);
>                 else
>                         page_counter_uncharge(&memcg->kmem, -nr_pages);
>         }
> }
>
> The regression is almost gone:
>
> 086f694a75e1a283 9ff9ec89a6dcf39f901ff0a84fd
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>        7632:13      -44932%        1791:3     dmesg.timestamp:last
>           1:13          -8%            :3     kmsg.common_interrupt:#No_irq_handler_for_vector
>           2:13         -20%            :3     kmsg.timestamp:common_interrupt:#No_irq_handler_for_vector
>        4072:13      -24827%         844:3     kmsg.timestamp:last
>          %stddev     %change         %stddev
>              \          |                \
>     144327 ±  3%      -1.9%     141594 ±  5%  hackbench.throughput
>                       regression dropped to -1.9% from -13.7%
>
>     473.44 ±  3%      +1.9%     482.23 ±  5%  hackbench.time.elapsed_time
>
>
> Regards
> Yin, Fengwei