lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c9b0656-fb09-1adf-664b-e2cbcc04b1db@arm.com>
Date:   Thu, 14 Dec 2023 11:37:27 +0000
From:   James Morse <james.morse@....com>
To:     babu.moger@....com, x86@...nel.org, linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
        dfustini@...libre.com, amitsinght@...vell.com
Subject: Re: [PATCH v7 13/24] x86/resctrl: Queue mon_event_read() instead of
 sending an IPI

Hi Babu,

On 09/11/2023 20:40, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> Intel is blessed with an abundance of monitors, one per RMID, that can be
>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>> the number implemented is up to the manufacturer. This means when there are
>> fewer monitors than needed, they need to be allocated and freed.
>>
>> MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The
>> CSU counter is allowed to return 'not ready' for a small number of
>> micro-seconds after programming. To allow one CSU hardware monitor to be
>> used for multiple control or monitor groups, the CPU accessing the
>> monitor needs to be able to block when configuring and reading the
>> counter.
>>
>> Worse, the domain may be broken up into slices, and the MMIO accesses
>> for each slice may need performing from different CPUs.
>>
>> These two details mean MPAMs monitor code needs to be able to sleep, and
>> IPI another CPU in the domain to read from a resource that has been sliced.
>>
>> mon_event_read() already invokes mon_event_count() via IPI, which means
>> this isn't possible. On systems using nohz-full, some CPUs need to be
>> interrupted to run kernel work as they otherwise stay in user-space
>> running realtime workloads. Interrupting these CPUs should be avoided,
>> and scheduling work on them may never complete.
>>
>> Change mon_event_read() to pick a housekeeping CPU, (one that is not using
>> nohz_full) and schedule mon_event_count() and wait. If all the CPUs
>> in a domain are using nohz-full, then an IPI is used as the fallback.
>>
>> This function is only used in response to a user-space filesystem request
>> (not the timing sensitive overflow code).
>>
>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>> the monitor-allocation in monitor.c. When the IPI fallback is used on
>> machines where MPAM needs to make an access on multiple CPUs, the counter
>> read will always fail.

>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index beccb0e87ba7..d07f99245851 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c

>> @@ -522,12 +524,21 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +static int smp_mon_event_count(void *arg)
>> +{
>> +	mon_event_count(arg);
>> +
>> +	return 0;
>> +}
> 
> Shouldn't this function defined as "void" similar to mon_event_count?
> Return code is not used anywhere.

smp_call_on_cpu() requires it to return an int, even if the value is not used.

This wrapper only exists because smp_call_on_cpu() takes a different prototype to
smp_call_function_any().


Thanks,

James

>> @@ -536,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>  	rr->val = 0;
>>  	rr->first = first;
>>  
>> -	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> +	cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> +
>> +	/*
>> +	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
>> +	 * are all the CPUs nohz_full? If yes, pick a CPU to IPI.
>> +	 * MPAM's resctrl_arch_rmid_read() is unable to read the
>> +	 * counters on some platforms if its called in irq context.
>> +	 */
>> +	if (tick_nohz_full_cpu(cpu))
>> +		smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> +	else
>> +		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
>>  }
>>  

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ