lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkaBnSwarz8yHu9RL_3DtaLRfjrcZ7m0YZZgHJsJdtHaZw@mail.gmail.com>
Date:   Tue, 24 Oct 2023 23:22:30 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Oliver Sang <oliver.sang@...el.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>
Cc:     Feng Tang <feng.tang@...el.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Yin, Fengwei" <fengwei.yin@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        Michal Koutný <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>,
        "kernel-team@...udflare.com" <kernel-team@...udflare.com>,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Domenico Cerasuolo <cerasuolodomenico@...il.com>
Subject: Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

On Tue, Oct 24, 2023 at 11:09 PM Oliver Sang <oliver.sang@...el.com> wrote:
>
> hi, Yosry Ahmed,
>
> On Tue, Oct 24, 2023 at 12:14:42AM -0700, Yosry Ahmed wrote:
> > On Mon, Oct 23, 2023 at 11:56 PM Oliver Sang <oliver.sang@...el.com> wrote:
> > >
> > > hi, Yosry Ahmed,
> > >
> > > On Mon, Oct 23, 2023 at 07:13:50PM -0700, Yosry Ahmed wrote:
> > >
> > > ...
> > >
> > > >
> > > > I still could not run the benchmark, but I used a version of
> > > > fallocate1.c that does 1 million iterations. I ran 100 in parallel.
> > > > This showed ~13% regression with the patch, so not the same as the
> > > > will-it-scale version, but it could be an indicator.
> > > >
> > > > With that, I did not see any improvement with the fixlet above or
> > > > ___cacheline_aligned_in_smp. So you can scratch that.
> > > >
> > > > I did, however, see some improvement with reducing the indirection
> > > > layers by moving stats_updates directly into struct mem_cgroup. The
> > > > regression in my manual testing went down to 9%. Still not great, but
> > > > I am wondering how this reflects on the benchmark. If you're able to
> > > > test it that would be great, the diff is below. Meanwhile I am still
> > > > looking for other improvements that can be made.
> > >
> > > we applied previous patch-set as below:
> > >
> > > c5f50d8b23c79 (linux-review/Yosry-Ahmed/mm-memcg-change-flush_next_time-to-flush_last_time/20231010-112257) mm: memcg: restore subtree stats flushing
> > > ac8a48ba9e1ca mm: workingset: move the stats flush into workingset_test_recent()
> > > 51d74c18a9c61 mm: memcg: make stats flushing threshold per-memcg
> > > 130617edc1cd1 mm: memcg: move vmstats structs definition above flushing code
> > > 26d0ee342efc6 mm: memcg: change flush_next_time to flush_last_time
> > > 25478183883e6 Merge branch 'mm-nonmm-unstable' into mm-everything   <---- the base our tool picked for the patch set
> > >
> > > I tried to apply below patch to either 51d74c18a9c61 or c5f50d8b23c79,
> > > but failed. could you guide how to apply this patch?
> > > Thanks
> > >
> >
> > Thanks for looking into this. I rebased the diff on top of
> > c5f50d8b23c79. Please find it attached.
>
> from our tests, this patch has little impact.
>
> it was applied as below ac6a9444dec85:
>
> ac6a9444dec85 (linux-devel/fixup-c5f50d8b23c79) memcg: move stats_updates to struct mem_cgroup
> c5f50d8b23c79 (linux-review/Yosry-Ahmed/mm-memcg-change-flush_next_time-to-flush_last_time/20231010-112257) mm: memcg: restore subtree stats flushing
> ac8a48ba9e1ca mm: workingset: move the stats flush into workingset_test_recent()
> 51d74c18a9c61 mm: memcg: make stats flushing threshold per-memcg
> 130617edc1cd1 mm: memcg: move vmstats structs definition above flushing code
> 26d0ee342efc6 mm: memcg: change flush_next_time to flush_last_time
> 25478183883e6 Merge branch 'mm-nonmm-unstable' into mm-everything
>
> for the first regression reported in original report, data are very close
> for 51d74c18a9c61, c5f50d8b23c79 (patch-set tip, parent of ac6a9444dec85),
> and ac6a9444dec85.
> full comparison is as [1]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/fallocate1/will-it-scale
>
> 130617edc1cd1ba1 51d74c18a9c61e7ee33bc90b522 c5f50d8b23c7982ac875791755b ac6a9444dec85dc50c6bfbc4ee7
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>      36509           -25.8%      27079           -25.2%      27305           -25.0%      27383        will-it-scale.per_thread_ops
>
> for the second regression reported in origianl report, seems a small impact
> from ac6a9444dec85.
> full comparison is as [2]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/fallocate1/will-it-scale
>
> 130617edc1cd1ba1 51d74c18a9c61e7ee33bc90b522 c5f50d8b23c7982ac875791755b ac6a9444dec85dc50c6bfbc4ee7
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>      76580           -30.0%      53575           -28.9%      54415           -26.7%      56152        will-it-scale.per_thread_ops
>
> [1]
>

Thanks Oliver for running the numbers. If I understand correctly the
will-it-scale.fallocate1 microbenchmark is the only one showing
significant regression here, is this correct?

In my runs, other more representative microbenchmarks benchmarks like
netperf and will-it-scale.page_fault* show minimal regression. I would
expect practical workloads to have high concurrency of page faults or
networking, but maybe not fallocate/ftruncate.

Oliver, in your experience, how often does such a regression in such a
microbenchmark translate to a real regression that people care about?
(or how often do people dismiss it?)

I tried optimizing this further for the fallocate/ftruncate case but
without luck. I even tried moving stats_updates into cgroup core
(struct cgroup_rstat_cpu) to reuse the existing loop in
cgroup_rstat_updated() -- but it somehow made it worse.

On the other hand, we do have some machines in production running this
series together with a previous optimization for non-hierarchical
stats [1] on an older kernel, and we do see significant reduction in
cpu time spent on reading the stats. Domenico did a similar experiment
with only this series and reported similar results [2].

Shakeel, Johannes, (and other memcg folks), I personally think the
benefits here outweigh a regression in this particular benchmark, but
I am obviously biased. What do you think?

[1]https://lore.kernel.org/lkml/20230726153223.821757-2-yosryahmed@google.com/
[2]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL7nK77e2A@mail.gmail.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ