[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230912032350.GA17008@ranerica-svr.sc.intel.com>
Date: Mon, 11 Sep 2023 20:23:50 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Andreas Herrmann <aherrmann@...e.de>
Cc: x86@...nel.org, Andreas Herrmann <aherrmann@...e.com>,
Catalin Marinas <catalin.marinas@....com>,
Chen Yu <yu.c.chen@...el.com>, Len Brown <len.brown@...el.com>,
Radu Rendec <rrendec@...hat.com>,
Pierre Gondois <Pierre.Gondois@....com>,
Pu Wen <puwen@...on.cn>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Sudeep Holla <sudeep.holla@....com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Will Deacon <will@...nel.org>, Zhang Rui <rui.zhang@...el.com>,
stable@...r.kernel.org, Ricardo Neri <ricardo.neri@...el.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/3] x86/cacheinfo: Set the number of leaves per CPU
On Fri, Sep 01, 2023 at 09:52:54AM +0200, Andreas Herrmann wrote:
> On Fri, Sep 01, 2023 at 08:50:31AM +0200, Andreas Herrmann wrote:
> > On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote:
> > > Hi,
> >
> > Hi Ricardo,
> >
> > > This is v3 of a patchset to set the number of cache leaves independently
> > > for each CPU. v1 and v2 can be found here [1] and here [2].
> >
> > I am on CC of your patch set and glanced through it.
> > Long ago I've touched related code but now I am not really up-to-date
> > to do a qualified review in this area. First, I would have to look
> > into documentation to refresh my memory etc. pp.
> >
> > I've not seen (or it escaped me) information that this was tested on a
> > variety of machines that might be affected by this change. And there
> > are no Tested-by-tags.
> >
> > Even if changes look simple and reasonable they can cause issues.
> >
> > Thus from my POV it would be good to have some information what tests
> > were done. I am not asking to test on all possible systems but just
> > knowing which system(s) was (were) used for functional testing is of
> > value.
>
> Doing a good review -- trying to catch every flaw -- is really hard to
> do. Especially when you are not actively doing development work in an
> area.
>
> For example see
>
> commit e33b93650fc5 ("blk-iocost: Pass gendisk to ioc_refresh_params")
> [Breno Leitao <leitao@...ian.org>, Tue Feb 28 03:16:54 2023 -0800]
>
> This fixes commit
>
> ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more
> useful") [Christoph Hellwig <hch@....de>, Fri Feb 3 16:03:54 2023
> +0100]
>
> I had reviewed the latter (see
> https://marc.info/?i=Y8plg6OAa4lrnyZZ@suselix) and the entire patch
> series. I've compared the original code with the patch and walked
> through every single hunk of the diff and tried to think it
> through. The changes made sense to me. Then came the bug report(s) and
> I felt that I had failed tremendously. To err is human.
>
> What this shows (and it is already known): with every patch new errors
> are potentially introduced in the kernel. Functional, and higher
> level testing can help to spot them before a kernel is deployed in the
> field.
>
> At a higher level view this proves another thing.
> Linux kernel development is a transparent example of
> "peer-review-process".
>
> In our scientific age it is often postulated that peer review is the
> way to go[1] and that it kind of guarantees that what's published, or
> rolled out, is reasonable and "works".
>
> The above sample shows that this process will not catch all flaws and
> that proper, transparent and reliable tests are required before
> anything is deployed in the field.
>
> This is true for every branch of science.
>
> If it is purposely not done something is fishy.
>
>
> [1] Also some state that it is "the only way to go" and every thing
> figured out without a peer-review-process is false and can't be
> trusted. Of course this is a false statement.
Hi Andreas,
Agreed. Testing is important. For the specific case of these patches, I
booted CONFIG_PREEMPT_RT and !CONFIG_PREEMPT_RT kernels. Then I
a) Ensured that the splat reported in commit 5944ce092b97
("arch_topology: Build cacheinfo from primary CPU") was not observed.
b) Ensured that /sys/devices/system/cpu/cpuX/cache is present.
c) Ensured that the contents /sys/devices/system/cpu/cpuX/cache is the
same before and after my patches.
I tested on the following systems: Intel Alder Lake, Intel Meteor
Lake, 2-socket Intel Icelake server, 2-socket Intel Cascade Lake server,
2-socket Intel Skylake server, 4-socket Intel Broadwell server, 2-socket
Intel Haswell server, 2-socket AMD Rome server, and 2-socket AMD Milan
server.
Thanks and BR,
Ricardo
Powered by blists - more mailing lists