linux-kernel - Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6aed34de-99ab-4601-ba4b-0870992c5da1@arm.com>
Date: Fri, 15 Dec 2023 17:40:55 +0000
From: James Morse <james.morse@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 H Peter Anvin <hpa@...or.com>, Babu Moger <Babu.Moger@....com>,
 shameerali.kolothum.thodi@...wei.com,
 D Scott Phillips OS <scott@...amperecomputing.com>,
 carl@...amperecomputing.com, lcherian@...vell.com,
 bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
 baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
 Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
 dfustini@...libre.com, amitsinght@...vell.com
Subject: Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from
 rdtgroup_exit()

Hi Reinette,

On 12/14/23 19:06, Reinette Chatre wrote:
> On 12/14/2023 10:28 AM, James Morse wrote:
>> On 13/12/2023 23:27, Reinette Chatre wrote:
>>> On 12/13/2023 10:03 AM, James Morse wrote:
>>>> On 09/11/2023 17:39, Reinette Chatre wrote:
>>>>> I expect cleanup to do the inverse of init. I do not know what was the
>>>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
>>>>
>>>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
>>>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
>>>> notifiers. (if you could run this code...)
>>
>>> hmmm ... if there is a risk of such a race would the init code not also be
>>> vulnerable to that with the notifiers up before rdtgroup_init()?
>>
>> Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately
>> comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state()
>> after all this init work has been done.
>>
>> (cpu hp always gives me a headache1)
> 
> Right. My comment was actually and specifically about rdtgroup_init() and attempting to
> understand your view of its races with the hotplug notifiers in response to your comment about
> its (the hotplug notifiers) races with rdtgroup_exit().
> 
> The current order of state initialization you mention and hotplug notifiers needing the
> state is sane and implies to expect an inverse order of teardown.
> 
>>> The races you mention
>>> are not obvious to me. I see the filesystem and hotplug code protected against races via
>>> the mutex and static keys. Could you please elaborate on the flows of concern?
>>
>> Functions like __check_limbo() (calling __rmid_entry()) are called under the
>> rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL.
>>
>> But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't
>> happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the
>> domain is going offline.
>>
>>
>> The only other path is via free_rmid(), I've not thought too much about this as
>> resctrl_exit() can't actually be invoked - this code is discarded by the linker.
>>
>> It could be run on MPAM, but only in response to an 'error interrupt' (which is optional)
>> - and all the MPAM error interrupts indicate a software bug.
> 
> This still just considers the resctrl state and hotplug notifiers.
> 
> I clearly am missing something. It is still not clear to me how this connects to your earlier
> comment about races with the rdtgroup_exit() code ... how the hotplug notifiers races with the
> filesystem register/unregister code.

I don't think there is a specific problem there, this was mostly about unexpected surprises because cpuhp/limbo_handler/overflow_handler all run asynchronously. I may also have added confusion because the code added here moves into rdtgroup_exit() which is renamed resctrl_exit() as part of dragging all this out to /fs/. (This is also why I tried to initially add it in its final location)


Thanks,

James