lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zk6EC12hC0wzPiIu@agluck-desk3.sc.intel.com>
Date: Wed, 22 May 2024 16:47:23 -0700
From: Tony Luck <tony.luck@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
	Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>,
	Drew Fustini <dfustini@...libre.com>,
	Dave Martin <Dave.Martin@....com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v18 15/17] x86/resctrl: Fix RMID reading sanity check for
 Sub-NUMA (SNC) mode

On Wed, May 22, 2024 at 02:25:23PM -0700, Reinette Chatre wrote:
> > +		/*
> > +		 * SNC: OK to read events on any CPU sharing same L3
> > +		 * cache instance.
> > +		 */
> > +		if (d->display_id != get_cpu_cacheinfo_id(smp_processor_id(),
> > +							  r->mon_display_scope))
> 
> By hardcoding that mon_display_scope is a cache instead of using get_domain_id_from_scope()
> it seems that all pretending about being generic has just been abandoned at this point.

Yes. It now seems like a futile quest to make this look
like something generic.  All this code is operating on the
rdt_resources_all[RDT_RESOURCE_L3] resource (which by its very name is
"L3" scoped). In the SNC case the L3 has been divided (in some senses,
but not all) into nodes.

Given that pretending isn't working ... just be explicit?

Some "thinking aloud" follows ...

struct rdt_resource:
    In order to track monitor events, resctrl must build a domain list based
    on the smallest measurement scope. So with SNC enabled, that is the
    node. With it disabled it is L3 cache scope (which on existing systems
    is the same as node scope).

    Maybe keep .mon_scope with the existing name, but define it to be the
    minimum measurement scope and use it to build domains. So it
    defaults to RESCTRL_L3_CACHE but SNC detection will rewrite it to
    RESCTRL_L3_NODE.

    Drop the .mon_display_scope field. By definition it must always have
    the value RESCTRL_L3_CACHE. So replace checks that compare values
    rdt_resources_all[RDT_RESOURCE_L3] of .mon_scope & .mon_display_scope
    with:

    	if (r->mon_scope != RESCTRL_L3_CACHE)
    		// SNC stuff
    	else
    		// regular stuff

struct rdt_mon_domain:
    In the rdt_mon_domain rename the display_id field with the more
    honest name "l3_cache_id".  In addition save a pointer to the
    .shared_cpu_map of the L3 cache. When SNC is off, this will be the
    same as the d->hdr.cpu_mask for the domain. For SNC on it will be
    a superset (encompassing all the bits from cpu_masks in all domains
    that share an L3 instance).

    Where SNC specifc code is required, the check becomes:

    	if (d->hdr.id != d->l3_cache_id)
		// SNC stuff
	else
		// regular stuff

    The l3_cache_id can be used in mkdir code to make the mon_L3_XX
    directories. The L3 .shared_cpu_map in picking a CPU to read
    the counters for the "sum" files. l3_cache_id also indicates
    which domains should be summed.


Does this look like a useful direction to pursue?

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ