lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2SV2A9C6V65YL3x@google.com>
Date: Thu, 19 Dec 2024 13:53:28 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: James Clark <james.clark@...aro.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	Sun Haiyong <sunhaiyong@...ngson.cn>,
	Yanteng Si <siyanteng@...ngson.cn>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/5] perf cpumap: If the die_id is missing use
 socket/physical_package_id

On Thu, Dec 19, 2024 at 09:20:39AM -0800, Ian Rogers wrote:
> On Wed, Dec 18, 2024 at 9:32 PM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Wed, Dec 18, 2024 at 09:42:27AM -0800, Ian Rogers wrote:
> > > On Wed, Dec 18, 2024 at 4:04 AM James Clark <james.clark@...aro.org> wrote:
> > > >
> > > >
> > > >
> > > > On 16/12/2024 11:24 pm, Ian Rogers wrote:
> > > > > An error value for a missing die_id may be written into things like
> > > > > the cpu_topology_map. As the topology needs to be fully written out,
> > > > > including the die_id, to allow perf.data file features to be aligned
> > > > > we can't allow error values to be written out. Instead base the
> > > > > missing die_id value off of the socket/physical_package_id assuming
> > > > > they correlate 1:1.
> > > > >
> > > > > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > > > > ---
> > > > >   tools/perf/util/cpumap.c | 3 ++-
> > > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> > > > > index 27094211edd8..d362272f8466 100644
> > > > > --- a/tools/perf/util/cpumap.c
> > > > > +++ b/tools/perf/util/cpumap.c
> > > > > @@ -283,7 +283,8 @@ int cpu__get_die_id(struct perf_cpu cpu)
> > > > >   {
> > > > >       int value, ret = cpu__get_topology_int(cpu.cpu, "die_id", &value);
> > > > >
> > > > > -     return ret ?: value;
> > > > > +        /* If die_id is missing fallback on using the socket/physical_package_id. */
> > > > > +     return ret || value < 0 ? cpu__get_socket_id(cpu) : value;
> > > > >   }
> > > > >
> > > > >   struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data)
> > > >
> > > > Hi Ian,
> > > >
> > > > I sent a fix for the same or a similar problem here [1]. For this one
> > > > I'm not sure why we'd want to use the socket ID for die when it's always
> > > > been 0 for not present. I wonder if this change is mingling two things:
> > > > fixing the negative error value appearing and replacing die with socket ID.
> > > >
> > > > Personally I would prefer to keep the 0 to fix the error value, that way
> > > > nobody gets surprised by the change.
> > > >
> > > > Also it looks like cpu__get_cluster_id() can suffer from the same issue,
> > > > and if we do it this way we should drop these as they aren't valid anymore:
> > > >
> > > >         /* There is no die_id on legacy system. */
> > > >         if (die < 0)
> > > >                 die = 0;
> > >
> > > I think this breaks the assumption here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/expr.c?h=perf-tools-next#n244
> >
> > Hmm.. I'm not sure how it worked before.  The code is already there and
> > it just changed the condition from == -1 to < 0, right?
> 
> You'd need to be testing on a multi-socket machine to see the issue.
> If you had say a dual socket Ampere chip and the die_id was missing,
> does it make sense for there to be two sockets/packages but only 1
> die? I think it is best we assume 1 die per socket when the die_id is
> missing, and to some crippled extent (because of the s390 workaround)
> the expr test is doing the sanity check.

AFAICS die ID is always used with socket ID already so I guess it means
an ID inside a socket.

Thanks,
Namhyung


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ