linux-kernel - Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f7facea-ffc4-63c3-b960-fa92eb03b04c@intel.com>
Date:   Thu, 24 Aug 2023 16:04:46 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     James Morse <james.morse@....com>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>
CC:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        <shameerali.kolothum.thodi@...wei.com>,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        <carl@...amperecomputing.com>, <lcherian@...vell.com>,
        <bobo.shaobowang@...wei.com>, <tan.shaopeng@...itsu.com>,
        <xingxin.hx@...nanolis.org>, <baolin.wang@...ux.alibaba.com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, <peternewman@...gle.com>,
        <dfustini@...libre.com>
Subject: Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory
 needed in resctrl_arch_rmid_read()

Hi James,

On 8/24/2023 9:56 AM, James Morse wrote:
> Hi Reinette,
> 
> On 09/08/2023 23:37, Reinette Chatre wrote:
>> On 7/28/2023 9:42 AM, James Morse wrote:
>>> Depending on the number of monitors available, Arm's MPAM may need to
>>> allocate a monitor prior to reading the counter value. Allocating a
>>> contended resource may involve sleeping.
>>>
>>> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
>>> the allocation should be valid for all domains.
>>>
>>> __check_limbo() and mon_event_count() each make multiple calls to
>>> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
>>> the allocation should be valid for multiple invocations of
>>> resctrl_arch_rmid_read().
>>>
>>> Add arch hooks for this allocation, which need calling before
>>> resctrl_arch_rmid_read(). The allocated monitor is passed to
>>> resctrl_arch_rmid_read(), then freed again afterwards. The helper
>>> can be called on any CPU, and can sleep.
> 
>> Looking at the error paths all the errors are silent failures.
> 
> Yeah, I don't really expect this to ever fail. The memory arm64 needs to allocate is
> smaller than a pointer - if that fails, I think there are bigger problems. The hardware
> resource is something the call will wait for.
> 
> As you note, it's hard to propagate an unlikely error back from here.
> 
> 
>> On the 
>> failure in mon_event_read() this could potentially be handled by setting
>> the "err" field in struct rmid_read ... at least then the caller can print
>> an error instead of displaying a zero count to the user.
> 
> Sure, that covers the one a human being might see.

Right.

>> The other failures are harder to handle though.
> 
> I don't think the silent failure is such a bad thing. For the limbo handler, no RMID moves
> between the lists until the handler is able to make progress.

ok, so it needs to ensure that the handler is still rescheduled
when such a failure is encountered.

> For the overflow handler, its possible an overflow will get missed (I still have an
> overflow interrupt I can use here). But I don't think this will be the biggest problem on
> a machine that is struggling to allocate 4 bytes.

As I now (I think) better understand for MPAM it is 4 bytes of memory as well as
reservation of a hardware resource. Could something go wrong attempting to find an
available hardware resource that as you state later is indeed scarce? I wonder if
it would not be helpful to at least have resctrl log an error from the
places where it is not possible to propagate the error.

>> Considering that these contexts are allocated and
>> freed so often, why not allocate them once (perhaps in struct rdt_hw_domain?)
>> on driver load with clear error handling?
> 
> Because the resource they represent is scarce. You may have 100 control or monitor groups,
> but only 10 hardware monitors. The hardware monitor has to be allocated and programmed
> before it can be read.

I think I misunderstood what "context" is when I wrote the above. I
was thinking about memory allocation that can be done early and
neglected to connect the "context" to be an actual hardware resource.

> This works well for the llc_occupancy counter, but not for bandwidth counters, which with
> the current 'free running' ABI have to all be allocated and programmed at the beginning of
> time. If there are enough monitors to do that - the MPAM driver will, and these
> allocate/free calls will just be looking up the pre-allocated/pre-programmed monitor.
> Doing the allocation like this keeps that logic in the mpam driver, and allows concurrent
> access to resctrl_arch_rmid_read(), which is something any future PMU support will need.
> 
> I don't have any numbers how many monitors any platform is going to have, but I'm
> confident there are some that won't have enough for each control-group or monitor-group to
> have one.

Right. My question was not relevant to what this change does. Sorry for the noise.

Reinette