lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-7enNN7bxHmf8T6@agluck-desk3>
Date: Thu, 3 Apr 2025 12:16:44 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: James Morse <james.morse@....com>
Cc: Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	H Peter Anvin <hpa@...or.com>, Babu Moger <Babu.Moger@....com>,
	shameerali.kolothum.thodi@...wei.com,
	D Scott Phillips OS <scott@...amperecomputing.com>,
	carl@...amperecomputing.com, lcherian@...vell.com,
	bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
	baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
	Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
	dfustini@...libre.com, amitsinght@...vell.com,
	David Hildenbrand <david@...hat.com>,
	Rex Nie <rex.nie@...uarmicro.com>,
	Dave Martin <dave.martin@....com>, Koba Ko <kobak@...dia.com>,
	Shanker Donthineni <sdonthineni@...dia.com>, fenghuay@...dia.com
Subject: Re: [PATCH v7 37/49] x86/resctrl: Expand the width of dom_id by
 replacing mon_data_bits

On Mon, Mar 24, 2025 at 05:52:37PM -0700, Luck, Tony wrote:
> On Thu, Mar 13, 2025 at 08:25:08AM -0700, Reinette Chatre wrote:
> > Hi James,
> > 
> > On 3/12/25 11:04 AM, James Morse wrote:
> > > On 07/03/2025 05:03, Reinette Chatre wrote:
> > >> On 2/28/25 11:59 AM, James Morse wrote:
> > 
> > ...
> > 
> > >> With all of the above I do not think this will work on an SNC enabled
> > >> system ... to confirm this I tried it out and it is not possible to mount
> > >> resctrl on an SNC enabled system and the WARN_ON_ONCE() this patch adds to
> > >> mon_add_all_files() is hit.
> > > 
> > > I hadn't realised the mon_sub directories for SNC weren't all directly under mon_data.
> > > Searching from mon_data will need the parent name too. What I've come up with is:
> > > -------%<-------
> > > 	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> > > 	if (!snc_mode) {
> > > 		sprintf(name, "mon_%s_%02d", r->name, d->hdr.id);
> > > 		kn_target_dir = kernfs_find_and_get(kn_mondata, name);
> > > 	} else {
> > > 		sprintf(name, "mon_%s_%02d", r->name, d->ci->id);
> > > 		kn_target_dir = kernfs_find_and_get(kn_mondata, name);
> > > 
> > > 		if (snc_mode && !do_sum) {
> > 
> > snc_mode should always be true here?
> > 
> > > 			sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
> > > 			kernfs_put(kn_target_dir);
> > 
> > I think this needs some extra guardrails. If kn_target_dir is NULL here
> > it looks like that the kernfs_put() above will be fine, but from what I can tell
> > the kernfs_find_and_get() below will not be.
> > 
> > > 			kn_target_dir = kernfs_find_and_get(kn_target_dir, name);
> > > 		}
> > > 	}
> > > 	kernfs_put(kn_target_dir);
> > > 	if (!kn_target_dir)
> > > 		return NULL;
> > > -------%<-------
> > > 
> > 
> > This looks good to me. In original patch a NULL kn within mon_get_default_kn_priv()
> > was used as prompt to create the private data. It is thus not obvious to me from this
> > snippet what is being returned "to", but I do not think that was your point of sharing
> > this snippet. 
> 
> Is this all overly complex trying to re-use the "priv" fields from
> the default resctrl group?  Would it be easier to just keep a list
> of each combinations of region id, domain id, sum, and event id that have
> already been allocated and re-use existing ones, or add to the list
> for new ones. Scanning this list may be less overhead that all the
> sprintf() and kernfs_find_and_get() searches.

James,

I played around with the simplification some more and tested on both
normal and SNC systems. Below is a patch against:
 
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/move_to_fs/v7

Note that this is *after* the move to fs/resctrl as I was too chicken
to try applying this before the move and then re-run the script to
move things. But if you take this suggestion, just mash it into your
"Expand the width of dom_id by replacing mon_data_bits" patch.
Maybe give me Co-developed-by credit, but that's not important.

Note that your expansion of mon_data is going to be very useful going
forward.  I want to add extra information to struct mon_data:

1) Flag to note that an event counter can be read from any CPU, not
just the ones in the domain specified by the mon_data/mon_L3_XX/*
file.

2) Type field to specify how to display the value of each counter
(since I want floating point instead of integer for the energy
counters).

-Tony


>From e2689c7439572608ce03a525c71c3fb88379057c Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@...el.com>
Date: Thu, 3 Apr 2025 10:53:55 -0700
Subject: [PATCH] fs/resctrl: Simplify allocation of mon_data structures

Instead of making a special case to allocate and attach these structures
to kernfs files in the default control group, simply allocate a structure
when a new combination of <rid, domain, mevt, do_sum> is needed and
re-use existing structures when possible.

Free all structures when resctrl filesystem is unmounted.

Partial revert of commit fa563b5171e9 ("x86/resctrl: Expand the width
of dom_id by replacing mon_data_bits")

Signed-off-by: Tony Luck <tony.luck@...el.com>
---
 fs/resctrl/internal.h |   2 +
 fs/resctrl/rdtgroup.c | 138 ++++++++++++------------------------------
 2 files changed, 40 insertions(+), 100 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index ec3863d18f68..e5976bd52a35 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -83,6 +83,7 @@ struct mon_evt {
 
 /**
  * struct mon_data - Monitoring details for each event file.
+ * @list:            List of all allocated structures.
  * @rid:             Resource id associated with the event file.
  * @evtid:           Event id associated with the event file.
  * @sum:             Set when event must be summed across multiple
@@ -96,6 +97,7 @@ struct mon_evt {
  * rdtgroup_mutex.
  */
 struct mon_data {
+	struct list_head list;
 	unsigned int rid;
 	enum resctrl_event_id evtid;
 	unsigned int sum;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 234ec9dbe5b3..4ec40850752a 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -69,6 +69,8 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
 
 static void rdtgroup_destroy_root(void);
 
+static void mon_put_kn_priv(void);
+
 struct dentry *debugfs_resctrl;
 
 /*
@@ -2873,6 +2875,7 @@ static void rdt_kill_sb(struct super_block *sb)
 		resctrl_arch_reset_all_ctrls(r);
 
 	rmdir_all_sub();
+	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	schemata_list_destroy();
@@ -2895,107 +2898,54 @@ static struct file_system_type rdt_fs_type = {
 	.kill_sb		= rdt_kill_sb,
 };
 
+static LIST_HEAD(kn_priv_list);
+
 /**
- * mon_get_default_kn_priv() - Get the mon_data priv data for this event from
- *                             the default control group.
+ * mon_get_kn_priv() - Get the mon_data priv data for this event
  * Called when monitor event files are created for a domain.
- * When called with the default control group, the structure will be allocated.
- * This happens at mount time, before other control or monitor groups are
- * created.
- * This simplifies the lifetime management for rmdir() versus domain-offline
- * as the default control group lives forever, and only one group needs to be
- * special cased.
+ * The same values are used in multiple directories. Keep a list
+ * of allocated structures and re-use an existing one with the same
+ * list of values for rid, domain, etc.
  *
- * @r:      The resource for the event type being created.
- * @d:	    The domain for the event type being created.
- * @mevt:   The event type being created.
- * @rdtgrp: The rdtgroup for which the monitor file is being created,
- *          used to determine if this is the default control group.
- * @do_sum: Whether the SNC sub-numa node monitors are being created.
+ * @rid:	The resource for the event type being created.
+ * @domid:	The domain for the event type being created.
+ * @mevt:	The event type being created.
+ * @do_sum:	Whether the SNC sub-numa node monitors are being created.
  */
-static struct mon_data *mon_get_default_kn_priv(struct rdt_resource *r,
-						struct rdt_mon_domain *d,
-						struct mon_evt *mevt,
-						struct rdtgroup *rdtgrp,
-						bool do_sum)
+static struct mon_data *mon_get_kn_priv(int rid, int domid, struct mon_evt *mevt, bool do_sum)
 {
-	struct kernfs_node *kn_dom, *kn_evt;
 	struct mon_data *priv;
-	bool snc_mode;
-	char name[32];
 
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	if (!do_sum)
-		sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
-	else
-		sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
-
-	kn_dom = kernfs_find_and_get(kn_mondata, name);
-	if (!kn_dom)
-		return NULL;
-
-	kn_evt = kernfs_find_and_get(kn_dom, mevt->name);
-
-	/* Is this the creation of the default groups monitor files? */
-	if (!kn_evt && rdtgrp == &rdtgroup_default) {
-		priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-		if (!priv)
-			return NULL;
-		priv->rid = r->rid;
-		priv->domid = do_sum ? d->ci->id : d->hdr.id;
-		priv->sum = do_sum;
-		priv->evtid = mevt->evtid;
-		return priv;
+	list_for_each_entry(priv, &kn_priv_list, list) {
+		if (priv->rid == rid && priv->domid == domid &&
+		    priv->sum == do_sum && priv->evtid == mevt->evtid)
+			return priv;
 	}
 
-	if (!kn_evt)
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
 		return NULL;
 
-	return kn_evt->priv;
+	priv->rid = rid;
+	priv->domid = domid;
+	priv->sum = do_sum;
+	priv->evtid = mevt->evtid;
+	list_add_tail(&priv->list, &kn_priv_list);
+
+	return priv;
 }
 
 /**
- * mon_put_default_kn_priv_all() - Potentially free the mon_data priv data for
- *                                 all events from the default control group.
- * Put the mon_data priv data for all events for a particular domain.
- * When called with the default control group, the priv structure previously
- * allocated will be kfree()d. This should only be done as part of taking a
- * domain offline.
- * Only a domain offline will 'rmdir' monitor files in the default control
- * group. After domain offline releases rdtgrp_mutex, all references will
- * have been removed.
- *
- * @rdtgrp:  The rdtgroup for which the monitor files are being removed,
- *           used to determine if this is the default control group.
- * @name:    The name of the domain or SNC sub-numa domain which is being
- *           taken offline.
+ * mon_put_kn_priv() - Free all allocated mon_data structures
+ * Called when resctrl file system is unmounted.
  */
-static void mon_put_default_kn_priv_all(struct rdtgroup *rdtgrp, char *name)
+static void mon_put_kn_priv(void)
 {
-	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct kernfs_node *kn_dom, *kn_evt;
-	struct mon_evt *mevt;
+	struct mon_data *priv, *tmp;
 
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	if (rdtgrp != &rdtgroup_default)
-		return;
-
-	kn_dom = kernfs_find_and_get(kn_mondata, name);
-	if (!kn_dom)
-		return;
-
-	list_for_each_entry(mevt, &r->evt_list, list) {
-		kn_evt = kernfs_find_and_get(kn_dom, mevt->name);
-		if (!kn_evt)
-			continue;
-		if (!kn_evt->priv)
-			continue;
-
-		kfree(kn_evt->priv);
-		kn_evt->priv = NULL;
+	list_for_each_entry_safe(priv, tmp, &kn_priv_list, list) {
+		kfree(priv);
+		list_del(&priv->list);
 	}
 }
 
@@ -3029,16 +2979,12 @@ static void mon_rmdir_one_subdir(struct rdtgroup *rdtgrp, char *name, char *subn
 	if (!kn)
 		return;
 
-	mon_put_default_kn_priv_all(rdtgrp, name);
-
 	kernfs_put(kn);
 
-	if (kn->dir.subdirs <= 1) {
+	if (kn->dir.subdirs <= 1)
 		kernfs_remove(kn);
-	} else {
-		mon_put_default_kn_priv_all(rdtgrp, subname);
+	else
 		kernfs_remove_by_name(kn, subname);
-	}
 }
 
 /*
@@ -3081,7 +3027,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 		return -EPERM;
 
 	list_for_each_entry(mevt, &r->evt_list, list) {
-		priv = mon_get_default_kn_priv(r, d, mevt, prgrp, do_sum);
+		priv = mon_get_kn_priv(r->rid, do_sum ? d->ci->id : d->hdr.id, mevt, do_sum);
 		if (WARN_ON_ONCE(!priv))
 			return -EINVAL;
 
@@ -3165,17 +3111,9 @@ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	struct rdtgroup *prgrp, *crgrp;
 	struct list_head *head;
 
-	/*
-	 * During domain-online create the default control group first
-	 * so that mon_get_default_kn_priv() can find the allocated structure
-	 * on subsequent calls.
-	 */
-	mkdir_mondata_subdir(kn_mondata, d, r, &rdtgroup_default);
-
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		parent_kn = prgrp->mon.mon_data_kn;
-		if (prgrp != &rdtgroup_default)
-			mkdir_mondata_subdir(parent_kn, d, r, prgrp);
+		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
 
 		head = &prgrp->mon.crdtgrp_list;
 		list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ