[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75VfoQ-rFEEFu2FnaPuPDwyiTHpA_dCwqfA1SYSkFPM2uMA@mail.gmail.com>
Date: Thu, 7 Oct 2021 14:51:15 +0300
From: Andy Shevchenko <andy.shevchenko@...il.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Thorsten Leemhuis <regressions@...mhuis.info>
Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Randy Dunlap <rdunlap@...radead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@...r.kernel.org>,
KUnit Development <kunit-dev@...glegroups.com>,
Linux Media Mailing List <linux-media@...r.kernel.org>,
netdev <netdev@...r.kernel.org>,
Brendan Higgins <brendanhiggins@...gle.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
Sakari Ailus <sakari.ailus@...ux.intel.com>,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Thomas Graf <tgraf@...g.ch>,
Herbert Xu <herbert@...dor.apana.org.au>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2 0/4] kernel.h further split
On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
> On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> > The kernel.h is a set of something which is not related to each other
> > and often used in non-crossed compilation units, especially when drivers
> > need only one or two macro definitions from it.
> >
> > Here is the split of container_of(). The goals are the following:
> > - untwist the dependency hell a bit
> > - drop kernel.h inclusion where it's only used for container_of()
> > - speed up C preprocessing.
> >
> > People, like Greg KH and Miguel Ojeda, were asking about the latter.
> > Read below the methodology and test setup with outcome numbers.
> >
> > The methodology
> > ===============
> > The question here is how to measure in the more or less clean way
> > the C preprocessing time when building a project like Linux kernel.
> > To answer it, let's look around and see what tools do we have that
> > may help. Aha, here is ccache tool that seems quite plausible to
> > be used. Its core idea is to preprocess C file, count hash (MD4)
> > and compare to ones that are in the cache. If found, return the
> > object file, avoiding compilation stage.
> >
> > Taking into account the property of the ccache, configure and use
> > it in the below steps:
> >
> > 1. Configure kernel with allyesconfig
> >
> > 2. Make it with `make` to be sure that the cache is filled with
> > the latest data. I.o.w. warm up the cache.
> >
> > 3. Run `make -s` (silent mode to reduce the influence of
> > the unrelated things, like console output) 10 times and
> > measure 'real' time spent.
> >
> > 4. Repeat 1-3 for each patch or patch set to get data sets before
> > and after.
> >
> > When we get the raw data, calculating median will show us the number.
> > Comparing them before and after we will see the difference.
> >
> > The setup
> > =========
> > I have used the Intel x86_64 server platform (see partial output of
> > `lscpu` below):
> >
> > $ lscpu
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Address sizes: 46 bits physical, 48 bits virtual
> > Byte Order: Little Endian
> > CPU(s): 88
> > On-line CPU(s) list: 0-87
> > Vendor ID: GenuineIntel
> > Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > CPU family: 6
> > Model: 79
> > Thread(s) per core: 2
> > Core(s) per socket: 22
> > Socket(s): 2
> > Stepping: 1
> > CPU max MHz: 3600.0000
> > CPU min MHz: 1200.0000
> > ...
> > Caches (sum of all):
> > L1d: 1.4 MiB (44 instances)
> > L1i: 1.4 MiB (44 instances)
> > L2: 11 MiB (44 instances)
> > L3: 110 MiB (2 instances)
> > NUMA:
> > NUMA node(s): 2
> > NUMA node0 CPU(s): 0-21,44-65
> > NUMA node1 CPU(s): 22-43,66-87
> > Vulnerabilities:
> > Itlb multihit: KVM: Mitigation: Split huge pages
> > L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
> > Mds: Mitigation; Clear CPU buffers; SMT vulnerable
> > Meltdown: Mitigation; PTI
> > Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
> > Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
> > Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
> > Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
> >
> > With the following GCC:
> >
> > $ gcc --version
> > gcc (Debian 10.3.0-11) 10.3.0
> >
> > The commands I have run during the measurement were:
> >
> > rm -rf $O
> > make O=$O allyesconfig
> > time make O=$O -s -j64 # this step has been measured
> >
> > The raw data and median
> > =======================
> > Before patch 2 (yes, I have measured the only patch 2 effect) in the series
> > (the data is sorted by time):
> >
> > real 2m8.794s
> > real 2m11.183s
> > real 2m11.235s
> > real 2m11.639s
> > real 2m11.960s
> > real 2m12.014s
> > real 2m12.609s
> > real 2m13.177s
> > real 2m13.462s
> > real 2m19.132s
> >
> > After patch 2 has been applied:
> >
> > real 2m8.536s
> > real 2m8.776s
> > real 2m9.071s
> > real 2m9.459s
> > real 2m9.531s
> > real 2m9.610s
> > real 2m10.356s
> > real 2m10.430s
> > real 2m11.117s
> > real 2m11.885s
> >
> > Median values are:
> > 131.987s before
> > 129.571s after
> >
> > We see the steady speedup as of 1.83%.
>
> You do know about kcbench:
> https://gitlab.com/knurd42/kcbench.git
>
> Try running that to make it such that we know how it was tested :)
I'll try it.
Meanwhile, Thorsten, can you have a look at my approach and tell if it
makes sense?
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists