[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f7facea-ffc4-63c3-b960-fa92eb03b04c@intel.com>
Date: Thu, 24 Aug 2023 16:04:46 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: James Morse <james.morse@....com>, <x86@...nel.org>,
<linux-kernel@...r.kernel.org>
CC: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
Babu Moger <Babu.Moger@....com>,
<shameerali.kolothum.thodi@...wei.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
<carl@...amperecomputing.com>, <lcherian@...vell.com>,
<bobo.shaobowang@...wei.com>, <tan.shaopeng@...itsu.com>,
<xingxin.hx@...nanolis.org>, <baolin.wang@...ux.alibaba.com>,
Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, <peternewman@...gle.com>,
<dfustini@...libre.com>
Subject: Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory
needed in resctrl_arch_rmid_read()
Hi James,
On 8/24/2023 9:56 AM, James Morse wrote:
> Hi Reinette,
>
> On 09/08/2023 23:37, Reinette Chatre wrote:
>> On 7/28/2023 9:42 AM, James Morse wrote:
>>> Depending on the number of monitors available, Arm's MPAM may need to
>>> allocate a monitor prior to reading the counter value. Allocating a
>>> contended resource may involve sleeping.
>>>
>>> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
>>> the allocation should be valid for all domains.
>>>
>>> __check_limbo() and mon_event_count() each make multiple calls to
>>> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
>>> the allocation should be valid for multiple invocations of
>>> resctrl_arch_rmid_read().
>>>
>>> Add arch hooks for this allocation, which need calling before
>>> resctrl_arch_rmid_read(). The allocated monitor is passed to
>>> resctrl_arch_rmid_read(), then freed again afterwards. The helper
>>> can be called on any CPU, and can sleep.
>
>> Looking at the error paths all the errors are silent failures.
>
> Yeah, I don't really expect this to ever fail. The memory arm64 needs to allocate is
> smaller than a pointer - if that fails, I think there are bigger problems. The hardware
> resource is something the call will wait for.
>
> As you note, it's hard to propagate an unlikely error back from here.
>
>
>> On the
>> failure in mon_event_read() this could potentially be handled by setting
>> the "err" field in struct rmid_read ... at least then the caller can print
>> an error instead of displaying a zero count to the user.
>
> Sure, that covers the one a human being might see.
Right.
>> The other failures are harder to handle though.
>
> I don't think the silent failure is such a bad thing. For the limbo handler, no RMID moves
> between the lists until the handler is able to make progress.
ok, so it needs to ensure that the handler is still rescheduled
when such a failure is encountered.
> For the overflow handler, its possible an overflow will get missed (I still have an
> overflow interrupt I can use here). But I don't think this will be the biggest problem on
> a machine that is struggling to allocate 4 bytes.
As I now (I think) better understand for MPAM it is 4 bytes of memory as well as
reservation of a hardware resource. Could something go wrong attempting to find an
available hardware resource that as you state later is indeed scarce? I wonder if
it would not be helpful to at least have resctrl log an error from the
places where it is not possible to propagate the error.
>> Considering that these contexts are allocated and
>> freed so often, why not allocate them once (perhaps in struct rdt_hw_domain?)
>> on driver load with clear error handling?
>
> Because the resource they represent is scarce. You may have 100 control or monitor groups,
> but only 10 hardware monitors. The hardware monitor has to be allocated and programmed
> before it can be read.
I think I misunderstood what "context" is when I wrote the above. I
was thinking about memory allocation that can be done early and
neglected to connect the "context" to be an actual hardware resource.
> This works well for the llc_occupancy counter, but not for bandwidth counters, which with
> the current 'free running' ABI have to all be allocated and programmed at the beginning of
> time. If there are enough monitors to do that - the MPAM driver will, and these
> allocate/free calls will just be looking up the pre-allocated/pre-programmed monitor.
> Doing the allocation like this keeps that logic in the mpam driver, and allows concurrent
> access to resctrl_arch_rmid_read(), which is something any future PMU support will need.
>
> I don't have any numbers how many monitors any platform is going to have, but I'm
> confident there are some that won't have enough for each control-group or monitor-group to
> have one.
Right. My question was not relevant to what this change does. Sorry for the noise.
Reinette
Powered by blists - more mailing lists