lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 10 Jul 2023 16:35:52 -0700
From:   Tony Luck <tony.luck@...el.com>
To:     Peter Newman <peternewman@...gle.com>
Cc:     James Morse <james.morse@....com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        Drew Fustini <dfustini@...libre.com>,
        Babu Moger <Babu.Moger@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        "carl@...amperecomputing.com" <carl@...amperecomputing.com>,
        "lcherian@...vell.com" <lcherian@...vell.com>,
        "bobo.shaobowang@...wei.com" <bobo.shaobowang@...wei.com>,
        "tan.shaopeng@...itsu.com" <tan.shaopeng@...itsu.com>,
        "xingxin.hx@...nanolis.org" <xingxin.hx@...nanolis.org>,
        "baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>,
        "Pitre, Nicolas" <npitre@...libre.com>,
        Kevin Hilman <khilman@...libre.com>,
        "aricciardi@...libre.com" <aricciardi@...libre.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "patches@...ts.linux.dev" <patches@...ts.linux.dev>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [RFC PATCH 2/2] resctrl2: Arch x86 modules for most of the
 legacy control/monitor functions

On Thu, Jul 06, 2023 at 12:22:03PM +0200, Peter Newman wrote:
> Hi Tony,
> 
> On Wed, Jul 5, 2023 at 6:46 AM Luck, Tony <tony.luck@...el.com> wrote:
> > The mbm_poll() code that makes sure that counters don't wrap is
> > doing all the expensive wrmsr(QM_EVTSEL);rdmsr(QM_COUNT)
> > once per second to give you the data you want.
> 
> I was doing that in the soft RMID series I posted earlier because it
> simplified things, but then I had some realizations about how much
> error +/- 1 second on the sampling point could result in[1]. We
> usually measure the bandwidth rate with a 5-second window, so a
> reading that's up to one second old would mean a 20% error in the
> bandwidth calculation.

I just pushed the latest version of the resctrl2 patches to the
resctrl2_v65rc1 branch of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

As well as locking, bug fixes, and general updates it includes an
experimental feature to provide summary MBM information for each
node. E.g. (both "total" and "local" rates are provided). Note
that you have to load modules rdt_mbm_local_bytes and
rdt_mbm_total_bytes so that the MBM overflow threads are
running. I should fix the code to print "n/a" instead of
"0" if they are not.

$ cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_summary
    3638     3638 /g2
    3087     3087 /g2/m2
    3267     3267 /g2/m1
    3443     3443 /g1
    3629     3629 /g1/m2
    3588     3587 /g1/m1
    3999     3993 /
    3370     3369 /m2
    3432     3432 /m1

The rates are produced once per second by the MBM overflow
code. They compute MBytes/sec as "chunks since last poll"
divided by (now - then). I'm using jiffies for the times
which may be good enough. "now - then" is one second (maybe
more if the kernel thread doing the MBM polling is delayed
from running).

I should fix the summarization code to work the same as the
regular MBM files (i.e. make the parent control directory
report the sum of all its children).

The code also attempts (but fails) to make these mbm_summary
files poll(2)-able. With the wakeup dependent on aggregate
measure bandwidth compared against a configurable threshold:

$ cat /sys/fs/resctrl/info/L3_MON/mbm_poll_threshold
10000000

There's something wrong though. Poll(2) always says there is
data to be read.  I only see one other piece of kernel code
implementing poll on kernfs (in the cgroup code). Perhaps
my problem is inability to write an appliction that uses
poll(2) correctly.

Let me know if this all seems like a useful direction. Maybe
the polling part is overkill and it is sufficient to just
have a cheap way to get all the bandwidths even if the values
seen might be up to one second old.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ