lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf7e9acd-01f0-4011-80e3-c11c0298fd73@intel.com>
Date: Wed, 19 Feb 2025 21:40:28 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: James Morse <james.morse@....com>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, H Peter Anvin <hpa@...or.com>, Babu Moger
	<Babu.Moger@....com>, <shameerali.kolothum.thodi@...wei.com>, "D Scott
 Phillips OS" <scott@...amperecomputing.com>, <carl@...amperecomputing.com>,
	<lcherian@...vell.com>, <bobo.shaobowang@...wei.com>,
	<tan.shaopeng@...itsu.com>, <baolin.wang@...ux.alibaba.com>, Jamie Iles
	<quic_jiles@...cinc.com>, Xin Hao <xhao@...ux.alibaba.com>,
	<peternewman@...gle.com>, <dfustini@...libre.com>, <amitsinght@...vell.com>,
	David Hildenbrand <david@...hat.com>, Rex Nie <rex.nie@...uarmicro.com>,
	"Dave Martin" <dave.martin@....com>, Koba Ko <kobak@...dia.com>, Shanker
 Donthineni <sdonthineni@...dia.com>
Subject: Re: [PATCH v6 37/42] x86/restrl: Expand the width of dom_id by
 replacing mon_data_bits

Hi James,

On 2/7/25 10:18 AM, James Morse wrote:
> MPAM platforms retrieve the cache-id property from the ACPI PPTT table.
> The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes
> the domain-id, and is packed into the mon_data_bits union bitfield.
> The width of cache-id in this field is 14 bits.
> 
> Expanding the union would break 32bit x86 platforms as this union is
> stored as the kernfs kn->priv pointer. This saved allocating memory
> for the priv data storage.
> 
> The firmware on MPAM platforms have used the PPTT cache-id field to
> expose the interconnect's id for the cache, which is sparse and uses
> more than 14 bits. Use of this id is to enable PCIe direct cache
> injection hints. Using this feature with VFIO means the value provided
> by the ACPI table should be exposed to user-space.
> 
> To support cache-id values greater than 14 bits, convert the
> mon_data_bits union to a structure. This is allocated when the kernfs
> file is created, and free'd when the monitor directory is rmdir'd.
> Readers and writers must hold the rdtgroup_mutex, and readers should
> check for a NULL pointer to protect against an open file preventing
> the kernfs file from being free'd immediately after the rmdir call.

The last sentence is difficult to parse and took me many reads. I see
two major parts to this statement and if I understand correctly the current
implementation combined with this patch does not support either.
(a) "checking for a NULL pointer from readers"
    The reader is rdtgroup_mondata_show() and it starts by calling:
		rdtgrp = rdtgroup_kn_lock_live(of->kn);
    As I understand, on return of rdtgroup_kn_lock_live() the kernfs node
    "of->kn" may no longer exist. This seems to be an issue with current code
    also.
    Considering this, it seems to me that checking if of->kn->priv is NULL
    may be futile if of->kn may no longer exist.
    I think this also needs a reference to the data needed by the file or
    the data needs to be stashed away before the call to
    kernfs_break_active_protection().
(b) "...being free'd immediately after the rmdir call"
    I believe this refers to expectation that one task may have the file open
    while another removes the resource group directory ("rmdir") with the
    assumption that the associated struct mon_data is removed during handling
    of rmdir. In this implementation the monitoring data file's struct mon_data
    is only removed when a monitoring domain goes offline. That is, when the
    resource group remains intact while the monitoring data files associated with
    one domain is removed (for example when all CPUs associated with that domain
    goes offline). The "rmdir" to remove a resource group does not call this code
    (mon_rmdir_one_subdir()), nor does the cleanup of the default resource group's
    "kn_mondata".  

I am trying to get a handle on the different lifetimes and if I understand
correctly this implementation does not attempt to keep the struct mon_data
accessible as long as the file is open. I do not think I've discovered all the
implications of this yet.

> 
> Signed-off-by: James Morse <james.morse@....com>
> ---
> Previously the MPAM tree repainted the cache-id to compact them,
> argue-ing there was no other user. With VFIO use of this PCIe feature,
> this is no longer an option:
> http://inbox.dpdk.org/dev/PH7PR12MB8596BF09963460CEAE17582E82522@PH7PR12MB8596.namprd12.prod.outlook.com/
> ---
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 19 +++++++-----
>  arch/x86/kernel/cpu/resctrl/internal.h    | 37 +++++++++++------------
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 28 ++++++++++++-----
>  3 files changed, 50 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 032a585293af..0b475e274483 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -645,7 +645,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  	u32 resid, evtid, domid;
>  	struct rdtgroup *rdtgrp;
>  	struct rdt_resource *r;
> -	union mon_data_bits md;
> +	struct mon_data *md;
>  	int ret = 0;
>  
>  	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> @@ -654,17 +654,22 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		goto out;
>  	}
>  
> -	md.priv = of->kn->priv;
> -	resid = md.u.rid;
> -	domid = md.u.domid;
> -	evtid = md.u.evtid;
> +	md = of->kn->priv;
> +	if (!md) {
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	resid = md->rid;
> +	domid = md->domid;
> +	evtid = md->evtid;
>  	r = resctrl_arch_get_resource(resid);
>  
> -	if (md.u.sum) {
> +	if (md->sum) {
>  		/*
>  		 * This file requires summing across all domains that share
>  		 * the L3 cache id that was provided in the "domid" field of the
> -		 * mon_data_bits union. Search all domains in the resource for
> +		 * struct mon_data. Search all domains in the resource for
>  		 * one that matches this cache id.
>  		 */
>  		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 32ed9aeffb90..16c1a391d012 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -103,27 +103,24 @@ struct mon_evt {
>  };
>  
>  /**
> - * union mon_data_bits - Monitoring details for each event file.
> - * @priv:              Used to store monitoring event data in @u
> - *                     as kernfs private data.
> - * @u.rid:             Resource id associated with the event file.
> - * @u.evtid:           Event id associated with the event file.
> - * @u.sum:             Set when event must be summed across multiple
> - *                     domains.
> - * @u.domid:           When @u.sum is zero this is the domain to which
> - *                     the event file belongs. When @sum is one this
> - *                     is the id of the L3 cache that all domains to be
> - *                     summed share.
> - * @u:                 Name of the bit fields struct.
> + * struct mon_data - Monitoring details for each event file.
> + * @rid:             Resource id associated with the event file.
> + * @evtid:           Event id associated with the event file.
> + * @sum:             Set when event must be summed across multiple
> + *                   domains.
> + * @domid:           When @sum is zero this is the domain to which
> + *                   the event file belongs. When @sum is one this
> + *                   is the id of the L3 cache that all domains to be
> + *                   summed share.
> + *
> + * Stored in the kernfs kn->priv field, readers and writers must hold
> + * rdtgroup_mutex.
>   */
> -union mon_data_bits {
> -	void *priv;
> -	struct {
> -		unsigned int rid		: 10;
> -		enum resctrl_event_id evtid	: 7;
> -		unsigned int sum		: 1;
> -		unsigned int domid		: 14;
> -	} u;
> +struct mon_data {
> +	unsigned int rid;
> +	enum resctrl_event_id evtid;
> +	unsigned int sum;
> +	unsigned int domid;
>  };
>  
>  /**
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6832ae603db3..abebe01447ba 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3113,11 +3113,19 @@ static struct file_system_type rdt_fs_type = {
>  };
>  
>  static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
> -		       void *priv)
> +		       struct mon_data *_priv)
>  {
>  	struct kernfs_node *kn;
> +	struct mon_data *priv;
>  	int ret = 0;
>  
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +	memcpy(priv, _priv, sizeof(*priv));
> +
>  	kn = __kernfs_create_file(parent_kn, name, 0444,
>  				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
>  				  &kf_mondata_ops, priv, NULL, NULL);
> @@ -3137,9 +3145,15 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
>  {
>  	struct kernfs_node *kn;
>  
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
>  	kn = kernfs_find_and_get(pkn, name);
>  	if (!kn)
>  		return;
> +
> +	kfree(kn->priv);
> +	kn->priv = NULL;
> +
>  	kernfs_put(kn);
>  
>  	if (kn->dir.subdirs <= 1)
> @@ -3180,19 +3194,19 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
>  			     bool do_sum)
>  {
>  	struct rmid_read rr = {0};
> -	union mon_data_bits priv;
> +	struct mon_data priv;
>  	struct mon_evt *mevt;
>  	int ret;
>  
>  	if (WARN_ON(list_empty(&r->evt_list)))
>  		return -EPERM;
>  
> -	priv.u.rid = r->rid;
> -	priv.u.domid = do_sum ? d->ci->id : d->hdr.id;
> -	priv.u.sum = do_sum;
> +	priv.rid = r->rid;
> +	priv.domid = do_sum ? d->ci->id : d->hdr.id;
> +	priv.sum = do_sum;
>  	list_for_each_entry(mevt, &r->evt_list, list) {
> -		priv.u.evtid = mevt->evtid;
> -		ret = mon_addfile(kn, mevt->name, priv.priv);
> +		priv.evtid = mevt->evtid;
> +		ret = mon_addfile(kn, mevt->name, &priv);
>  		if (ret)
>  			return ret;
>  

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ