lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d69d0ca-212d-9b1b-3001-9f56731e48fd@arm.com>
Date:   Fri, 8 Sep 2023 16:58:20 +0100
From:   James Morse <james.morse@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
        dfustini@...libre.com
Subject: Re: [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to
 sleep

Hi Reinette,

On 8/25/23 00:02, Reinette Chatre wrote:
> On 8/24/2023 9:56 AM, James Morse wrote:
>> On 09/08/2023 23:36, Reinette Chatre wrote:
>>> On 7/28/2023 9:42 AM, James Morse wrote:
>>>> MPAM's cache occupancy counters can take a little while to settle once
>>>> the monitor has been configured. The maximum settling time is described
>>>> to the driver via a firmware table. The value could be large enough
>>>> that it makes sense to sleep. To avoid exposing this to resctrl, it
>>>> should be hidden behind MPAM's resctrl_arch_rmid_read().
>>>>
>>>> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
>>>> to sleep. In this case resctrl_arch_rmid_read() should return an error
>>>> if it needs to sleep. This will only affect MPAM platforms where
>>>> the cache occupancy counter isn't available immediately, nohz_full is
>>>> in use, and there are there are no housekeeping CPUs in the necessary
>>>> domain.
>>>>
>>>> There are three callers of resctrl_arch_rmid_read():
>>>> __mon_event_count() and __check_limbo() are both called from a
>>>> non-migrateable context. mon_event_read() invokes __mon_event_count()
>>>> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
>>>> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
>>>> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
>>>> also adds work to a per-cpu workqueue.
>>>>
>>>> The remaining call is add_rmid_to_limbo() which is called in response
>>>> to a user-space syscall that frees an RMID. This opportunistically
>>>> reads the LLC occupancy counter on the current domain to see if the
>>>> RMID is over the dirty threshold. This has to disable preemption to
>>>> avoid reading the wrong domain's value. Disabling pre-emption here
>>>> prevents resctrl_arch_rmid_read() from sleeping.
>>>>
>>>> add_rmid_to_limbo() walks each domain, but only reads the counter
>>>> on one domain. If the system has more than one domain, the RMID will
>>>> always be added to the limbo list. If the RMIDs usage was not over the
>>>> threshold, it will be removed from the list when __check_limbo() runs.
>>>> Make this the default behaviour. Free RMIDs are always added to the
>>>> limbo list for each domain.
>>>>
>>>> The user visible effect of this is that a clean RMID is not available
>>>> for re-allocation immediately after 'rmdir()' completes, this behaviour
>>>> was never portable as it never happened on a machine with multiple
>>>> domains.
>>>>
>>>> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
>>>> with interrupts unmasked. Document this is the expected behaviour, and
>>>> add a might_sleep() annotation to catch changes that won't work on arm64.
>>
>>
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index 660752406174..f7311102e94c 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>>>>   			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>>>>   			   u64 *val);
>>>>   
>>>> +/**
>>>> + * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
>>>> + *
>>>> + * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
>>>> + * resctrl_arch_rmid_read() is called with preemption disabled.
>>>> + */
>>>> +static inline void resctrl_arch_rmid_read_context_check(void)
>>>> +{
>>>> +	if (!irqs_disabled())
>>>> +		might_sleep();
>>>> +}
>>
>>> Apologies but even after rereading the patch as well as your response to
>>> the previous patch version several times I am not able to understand why the
>>> code is looking like above. If, like according to the comment above, a
>>> warning should be generated with preemption disabled, then should it not
>>> just be "might_sleep()" without the "!irqs_disabled()" check?
>>
>> This would be simpler. But for NOHZ_FULL you wanted to keep the IPI, so the contract with
>> resctrl_arch_rmid_read() is that if interrupts are unmasked, it can sleep.
> 
> Thank you. This appears to be the key. Could you please add this
> information to resctrl_arch_rmid_read_context_check()'s description?

That comment now reads:
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
  *
  * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
  * resctrl_arch_rmid_read() is called with preemption disabled.
  *
  * The contract with resctrl_arch_rmid_read() is that if interrupts
  * are unmasked, it can sleep. This allows NOHZ_FULL systems to use an
  * IPI, (and fail if the call needed to sleep), while most of the time
  * the work is scheduled, allowing the call to sleep.



Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ