lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9282d346-80eb-88bd-b9d0-0c3dd16d51db@arm.com>
Date:   Thu, 27 Apr 2023 15:20:31 +0100
From:   James Morse <james.morse@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com
Subject: Re: [PATCH v3 18/19] x86/resctrl: Add cpu offline callback for
 resctrl work

Hi Reinette,

On 06/04/2023 00:48, Reinette Chatre wrote:
> On 3/20/2023 10:26 AM, James Morse wrote:
> 
>> -static int resctrl_offline_cpu(unsigned int cpu)
>> -{
>> -	struct rdtgroup *rdtgrp;
>>  	struct rdt_resource *r;
>>  
>>  	mutex_lock(&rdtgroup_mutex);
>> +	resctrl_offline_cpu(cpu);
>> +
>>  	for_each_capable_rdt_resource(r)
>>  		domain_remove_cpu(cpu, r);
>> -	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
>> -		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
>> -			clear_childcpus(rdtgrp, cpu);
>> -			break;
>> -		}
>> -	}
>>  	clear_closid_rmid(cpu);
>>  	mutex_unlock(&rdtgroup_mutex);
>>  
> 
> I find this and the previous patch to be very complicated.

It consolidates the parts of this that have nothing to do with the architecture specific code.
The extra work is because the semantics are: "this CPU is going away", the callee needs to
to not pick 'this CPU' again when updating any structures.

Ensuring the structures have not yet been modified by the architecture code is the
cleanest interface as there is nothing special about what the arch code provides to the
filesystem here.

I agree it looks like a special case, but only because the existing code is being called
halfway through the tear down, and depends on what the arch code has already done.

Having a single call, where nothing has been changed yet is the most maintainable option
as it avoids extra hooks, or an incomplete list of what has been torn down, and what
hasn't - some of which may be architecture specific.

It also avoids any interaction with how the architecture code chooses to prevent multiple
writers to the domain list - I don't want any of the filesystem code to depend on a lock
held by the architecture specific code.


> It is not clear
> to me why resctrl_offline_cpu(cpu) is required to be before offline of domain.
> Previous patch would not be needed if the existing order of operations
> is maintained.

The existing order is a bit of a soup.

You'd need a resctrl_domain_rebalance_helpers() to move the limbo and mbm workers, but
this would run after the CPU had been removed from the domain. Hopefully the name conveys
that it doesn't always run when a CPU is going offline.
resctrl_offline_cpu() would potentially run after the CPUs domains have been free()d,
depending on what gets added in the future this might be a problem, leading to a
resctrl_pre_offline_cpu() hook.

I worry this strange state leads to extra special-case'd filesystem code, and extra hooks.


I can split the consolidation of the filesystem code up in this patch, the
clear_childcpus() and limbo/mbm stuff can be done in separate patches, which might make it
easier on the eye.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ