linux-kernel - Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <acd712b8-748b-e9f5-0bf3-9cfadca34c95@arm.com>
Date:   Fri, 15 Sep 2023 18:37:33 +0100
From:   James Morse <james.morse@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
        dfustini@...libre.com
Subject: Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory
 needed in resctrl_arch_rmid_read()

Hi Reinette,

On 25/08/2023 00:04, Reinette Chatre wrote:
> On 8/24/2023 9:56 AM, James Morse wrote:
>> On 09/08/2023 23:37, Reinette Chatre wrote:
>>> On 7/28/2023 9:42 AM, James Morse wrote:
>>>> Depending on the number of monitors available, Arm's MPAM may need to
>>>> allocate a monitor prior to reading the counter value. Allocating a
>>>> contended resource may involve sleeping.
>>>>
>>>> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
>>>> the allocation should be valid for all domains.
>>>>
>>>> __check_limbo() and mon_event_count() each make multiple calls to
>>>> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
>>>> the allocation should be valid for multiple invocations of
>>>> resctrl_arch_rmid_read().
>>>>
>>>> Add arch hooks for this allocation, which need calling before
>>>> resctrl_arch_rmid_read(). The allocated monitor is passed to
>>>> resctrl_arch_rmid_read(), then freed again afterwards. The helper
>>>> can be called on any CPU, and can sleep.
>>
>>> Looking at the error paths all the errors are silent failures.
>>
>> Yeah, I don't really expect this to ever fail. The memory arm64 needs to allocate is
>> smaller than a pointer - if that fails, I think there are bigger problems. The hardware
>> resource is something the call will wait for.
>>
>> As you note, it's hard to propagate an unlikely error back from here.
>>
>>
>>> On the 
>>> failure in mon_event_read() this could potentially be handled by setting
>>> the "err" field in struct rmid_read ... at least then the caller can print
>>> an error instead of displaying a zero count to the user.
>>
>> Sure, that covers the one a human being might see.
> 
> Right.
> 
>>> The other failures are harder to handle though.
>>
>> I don't think the silent failure is such a bad thing. For the limbo handler, no RMID moves
>> between the lists until the handler is able to make progress.
> 
> ok, so it needs to ensure that the handler is still rescheduled
> when such a failure is encountered.

Yup, the silent error occurs in __check_limbo(), and cqm_handle_limbo() will still
reschedule the worker. Similarly, for mbm_update(), mbm_handle_overflow() will still
reschedule the work.


>> For the overflow handler, its possible an overflow will get missed (I still have an
>> overflow interrupt I can use here). But I don't think this will be the biggest problem on
>> a machine that is struggling to allocate 4 bytes.
> 
> As I now (I think) better understand for MPAM it is 4 bytes of memory as well as
> reservation of a hardware resource. Could something go wrong attempting to find an
> available hardware resource that as you state later is indeed scarce? I wonder if
> it would not be helpful to at least have resctrl log an error from the
> places where it is not possible to propagate the error.

If it can't allocate a monitor, it should block until one becomes available. Errors should
never occur during normal use.

I'll add pr_warn_ratelimited() for errors returned on this path.


>>> Considering that these contexts are allocated and
>>> freed so often, why not allocate them once (perhaps in struct rdt_hw_domain?)
>>> on driver load with clear error handling?
>>
>> Because the resource they represent is scarce. You may have 100 control or monitor groups,
>> but only 10 hardware monitors. The hardware monitor has to be allocated and programmed
>> before it can be read.
> 
> I think I misunderstood what "context" is when I wrote the above. I
> was thinking about memory allocation that can be done early and
> neglected to connect the "context" to be an actual hardware resource.

Let me know if there is a better name. Obviously I had to avoid 'resource'!



Thanks,

James