lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <09e2894e-eb2b-12a0-c4a2-5000b27bbded@amd.com>
Date:   Fri, 8 Sep 2023 16:35:05 -0500
From:   "Moger, Babu" <bmoger@....com>
To:     "Luck, Tony" <tony.luck@...el.com>,
        "babu.moger@....com" <babu.moger@....com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        "james.morse@....com" <james.morse@....com>
Cc:     Amit Singh Tomar <amitsinght@...vell.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        George Cherian <gcherian@...vell.com>,
        "robh@...nel.org" <robh@...nel.org>,
        "peternewman@...gle.com" <peternewman@...gle.com>,
        Drew Fustini <dfustini@...libre.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: resctrl2 - status

Hi Tony,


On 9/8/2023 1:51 PM, Luck, Tony wrote:
>>> Can you try this out on an AMD system. I think I covered most of the
>>> existing AMD resctrl features, but I have no machine to test the code
>>> on, so very likely there are bugs in these code paths.
>>>
>>> I'd like to make any needed changes now, before I start breaking this
>>> into reviewable bite-sized patches to avoid too much churn.
>> I tried your latest code briefly on my system.  Unfortunately, I could
>> not get it to work on my AMD system.
>>
>> # git branch -a
>>     next
>> * resctrl2_v65
>> # ]# uname -r
>> 6.5.0+
>> #lsmod |grep rdt
>> rdt_show_ids           12288  0
>> rdt_mbm_local_bytes    12288  0
>> rdt_mbm_total_bytes    12288  0
>> rdt_llc_occupancy      12288  0
>> rdt_l3_cat             16384  0
>>
>> # lsmod |grep mbe
>> amd_mbec               16384  0
>>
>> I could not get  rdt_l3_mba
>>
>> # modprobe rdt_l3_mba
>> modprobe: ERROR: could not insert 'rdt_l3_mba': No such device
>>
>> I don't see any data for the default group either.
>>
>> mount  -t resctrl resctrl /sys/fs/resctrl/
>>
>> cd /sys/fs/resctrl/mon_data/mon_L3_00
>>
>> cat mbm_summary
>>        n/a      n/a /
> Babu,
>
> Thank a bunch for taking this for a quick spin. There's several bits of
> good news there. Several modules automatically loaded as expected.
> Nothing went "OOPS" and crashed the system.
>
> Here’s the code that the rdt_l3_mba module runs that can cause failure
> to load with "No such device"
>
>          if (!boot_cpu_has(X86_FEATURE_RDT_A)) {
>                  pr_debug("No RDT allocation support\n");
>                  return -ENODEV;
>          }

Shouldn't this be ?(or similar)

if (!rdt_cpu_has(X86_FEATURE_MBA))
                 return false;

>          mba_features = cpuid_ebx(0x10);
>
>          if (!(mba_features & BIT(3))) {
>                  pr_debug("No RDT MBA allocation\n");
>                  return -ENODEV;
>          }
>
> I assume the first test must have succeeded (same code in rdt_l3_cat, and
> that loaded OK). So must be the second. How does AMD enumerate MBA
> support?
>
> Less obvious what is the root cause of the mbm_summary file to fail to
> show any data. rdt_mbm_local_bytes  and rdt_mbm_total_bytes  modules
> loaded OK. So I'm looking for the right CPUID bits to detect memory bandwidth
> monitoring.

I am still not sure if resctrl2 will address all the current gaps in 
resctrl1. We should probably list all issues on the table before we go 
that route.

One of the main issue for AMD is coupling of LLC domains.

For example, AMD hardware supports 16 CLOSids per LLC domain. But Linux 
design assumes that there are globally 16 total CLOSIDs for the whole 
systems. We can only create 16 CLOSID now irrespective of how many 
domains are there.

In reality, we should be able to create "16 x number of LLC domains" 
CLOSIDS in the systems.  This is more evident in AMD. But, same problem 
applies to Intel with multiple sockets.

My 02 cents. Hope to discuss more in our upcoming meeting.

thanks


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ