linux-kernel - Re: [PATCH v5 24/24] x86/resctrl: Separate arch and fs resctrl locks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4085f4f3-7a3e-8af8-1ae3-1040ca78f59f@intel.com>
Date:   Wed, 9 Aug 2023 15:41:01 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     James Morse <james.morse@....com>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>
CC:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        <shameerali.kolothum.thodi@...wei.com>,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        <carl@...amperecomputing.com>, <lcherian@...vell.com>,
        <bobo.shaobowang@...wei.com>, <tan.shaopeng@...itsu.com>,
        <xingxin.hx@...nanolis.org>, <baolin.wang@...ux.alibaba.com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, <peternewman@...gle.com>,
        <dfustini@...libre.com>
Subject: Re: [PATCH v5 24/24] x86/resctrl: Separate arch and fs resctrl locks

Hi James,

On 7/28/2023 9:42 AM, James Morse wrote:
> resctrl has one mutex that is taken by the architecture specific code,
> and the filesystem parts. The two interact via cpuhp, where the
> architecture code updates the domain list. Filesystem handlers that
> walk the domains list should not run concurrently with the cpuhp
> callback modifying the list.
> 
> Exposing a lock from the filesystem code means the interface is not
> cleanly defined, and creates the possibility of cross-architecture
> lock ordering headaches. The interaction only exists so that certain
> filesystem paths are serialised against cpu hotplug. The cpu hotplug

cpu hotplug -> CPU hotplug

> code already has a mechanism to do this using cpus_read_lock().
> 
> MPAM's monitors have an overflow interrupt, so it needs to be possible
> to walk the domains list in irq context. RCU is ideal for this,
> but some paths need to be able to sleep to allocate memory.
> 
> Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part
> of a cpuhp callback, cpus_read_lock() must always be taken first.
> rdtgroup_schemata_write() already does this.
> 
> Most of the filesystem code's domain list walkers are currently
> protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live().
> The exceptions are rdt_bit_usage_show() and the mon_config helpers
> which take the lock directly.
> 
> Make the domain list protected by RCU. An architecture-specific
> lock prevents concurrent writers. rdt_bit_usage_show() can
> walk the domain list under rcu_read_lock(). The mon_config helpers
> send multiple IPIs, take the cpus_read_lock() in these cases.
> 
> The other filesystem list walkers need to be able to sleep.
> Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the
> cpuhp callbacks can't be invoked when file system operations are
> occurring.
> 
> Add lockdep_assert_cpus_held() in the cases where the
> rdtgroup_kn_lock_live() call isn't obvious.
> 
> Resctrl's domain online/offline calls now need to take the
> rdtgroup_mutex themselves.
> 
> Tested-by: Shaopeng Tan <tan.shaopeng@...itsu.com>
> Signed-off-by: James Morse <james.morse@....com>

...

> @@ -464,6 +467,9 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
>  	bool sep = false;
>  	u32 ctrl_val;
>  
> +	/* Walking r->domains, ensure it can't race with cpuhp */
> +	lockdep_assert_cpus_held();
> +
>  	seq_printf(s, "%*s:", max_name_width, schema->name);
>  	list_for_each_entry(dom, &r->domains, list) {
>  		if (sep)
> @@ -534,8 +540,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  {
>  	int cpu;
>  
> -	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> -	lockdep_assert_held(&rdtgroup_mutex);
> +	/* When picking a cpu from cpu_mask, ensure it can't race with cpuhp */

cpu -> CPU

> +	lockdep_assert_cpus_held();
>  
>  	/*
>  	 * Setup the parameters to pass to mon_event_count() to read the data.

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index a256a96df487..47dcf2cb76ca 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -35,6 +35,10 @@
>  DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
>  DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>  DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
> +
> +/* Mutex to protect rdtgroup access. */
> +DEFINE_MUTEX(rdtgroup_mutex);
> +
>  static struct kernfs_root *rdt_root;
>  struct rdtgroup rdtgroup_default;
>  LIST_HEAD(rdt_all_groups);
> @@ -954,7 +958,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>  
>  	mutex_lock(&rdtgroup_mutex);
>  	hw_shareable = r->cache.shareable_bits;
> -	list_for_each_entry(dom, &r->domains, list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(dom, &r->domains, list) {
>  		if (sep)
>  			seq_putc(seq, ';');
>  		sw_shareable = 0;

Does rdt_bit_usage_show() really need RCU? It is another filesystem callback and I
do not see a reason why it should access the domain list in a different way. It
can follow the same pattern as all the other resctrl filesystem ops and use
cpus_read_lock().

> @@ -1010,8 +1015,10 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>  		}
>  		sep = true;
>  	}
> +	rcu_read_unlock();
>  	seq_putc(seq, '\n');
>  	mutex_unlock(&rdtgroup_mutex);
> +

Unnecessary empty line.


>  	return 0;
>  }


Reinette