lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Nov 2022 18:36:44 +0000
From:   James Morse <james.morse@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>,
        Peter Newman <peternewman@...gle.com>
Cc:     Tony Luck <tony.luck@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Eranian, Stephane" <eranian@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Babu Moger <Babu.Moger@....com>,
        Gaurang Upasani <gupasani@...gle.com>
Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

Hi Reinette,

On 09/11/2022 19:12, Reinette Chatre wrote:
> On 11/9/2022 9:59 AM, James Morse wrote:
>> On 08/11/2022 21:28, Reinette Chatre wrote:
>>> On 11/3/2022 10:06 AM, James Morse wrote:
>>>> (I've not got to the last message in this part of the thread yes - I'm out of time this
>>>> week, back Monday!)
>>>>
>>>> On 21/10/2022 21:09, Reinette Chatre wrote:
>>>>> On 10/19/2022 6:57 AM, James Morse wrote:
>>>>>> On 17/10/2022 11:15, Peter Newman wrote:
>>>>>>> On Wed, Oct 12, 2022 at 6:55 PM James Morse <james.morse@....com> wrote:
>>>
>>> ...
>>>
>>>>>>> If there are a lot more PARTIDs than PMGs, then it would fit well with a
>>>>>>> user who never creates child MON groups. In case the number of MON
>>>>>>> groups gets ahead of the number of CTRL_MON groups and you've run out of
>>>>>>> PMGs, perhaps you would just try to allocate another PARTID and program
>>>>>>> the same partitioning configuration before giving up.
>>>>>>
>>>>>> User-space can choose to do this.
>>>>>> If the kernel tries to be clever and do this behind user-space's back, it needs to
>>>>>> allocate two monitors for this secretly-two-control-groups, and always sum the counters
>>>>>> before reporting them to user-space.
>>>>
>>>>> If I understand this scenario correctly, the kernel is already doing this.
>>>>> As implemented in mon_event_count() the monitor data of a CTRL_MON group is
>>>>> the sum of the parent CTRL_MON group and all its child MON groups.
>>>>
>>>> That is true. MPAM has an additional headache here as it needs to allocate a monitor in
>>>> order to read the counters. If there are enough monitors for each CLOSID*RMID to have one,
>>>> then MPAM can export the counter files in the same way RDT does.
>>>>
>>>> While there are systems that have enough monitors, I don't think this is going to be the
>>>> norm. To allow systems that don't have a surfeit of monitors to use the counters, I plan
>>>> to export the values from resctrl_arch_rmid_read() via perf. (but only for bandwidth counters)
>>
>>> This sounds related to the way monitoring was done in earlier kernels. This was
>>> long before I become involved with this work. Unfortunately I am not familiar with
>>> all the history involved that ended in it being removed from the kernel.
>>
>> Yup, I'm aware there is some history to this. It's not appropriate for the llc_occupancy
>> counter as that reports state, instead of events.

> Perf counts events while a process is running

It's hooked up as an uncore PMU driver and it rejects attempts to attach it to a task.
Some useful background is it has to be told which of the existing resctrl control/monitor
groups to monitor. On x86 its just returning the the increase in events from the mbm files
in resctrl via resctrl_arch_rmid_read().
Unless you're curious [0], the details can come if/when I post it!


> so memory bandwidth monitoring may
> also be impacted by the caveats Peter mentioned for the upcoming AMD changes:
> 
> https://lore.kernel.org/lkml/CALPaoCidd+WwGTyE3D74LhoL13ce+EvdTmOnyPrQN62j+zZ1fg@mail.gmail.com/
> ("This has the caveats that evictions while one task is running could have
> resulted from a previous task on the current CPU, but will be counted
> against the new task's software-RMID, ...")

If the logic to implement that is hidden entirely behind resctrl_arch_rmid_read(), then
there should be no problem. (the values will be noisy, but that is the best that can be
done on that platform)


Thanks,

James

[0] Beware, the changes to x86 to make resctrl_arch_rmid_read() irq safe aren't quite right.
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/v6.0&id=b8ae575bd17e1d56db0f84dc456b964a23d252d6

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ