lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 8 Nov 2021 02:17:43 +0000
From:   "tarumizu.kohei@...itsu.com" <tarumizu.kohei@...itsu.com>
To:     'Borislav Petkov' <bp@...en8.de>
CC:     "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and
 Intel processors

Hi,

Thanks for your comment.

> This is all fine and dandy but what I'm missing in this pile of text - at least I couldn't
> find it - is why do we need this in the upstream kernel?
> 
> Is there some real-life use case that would benefit from software fiddling with
> prefetchers or is this one of those, well, we have those controls, lets expose them
> in the OS?
> 
> IOW, you need to sell this stuff properly first - then talk design.

A64FX and some Intel processors has implementation-dependent register
for controlling hardware prefetch. Intel has MSR_MISC_FEATURE_CONTROL,
and A64FX has IMP_PF_STREAM_DETECT_CTRL_EL0. These register cannot be
accessed from userspace, so we provide a proper kernel interface.

The advantage of using this interface from userspace is that we can
expect performance improvements.

The following performance improvements have been reported for some
Intel processors.
https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

A64FX also has several applications that have actually been improved
performance. In most of these cases, we are tuning the parameter of
hardware prefetch distance. One of them is the Stream benchmark.

For reference, here is the result of STREAM Triad when tuning with
the dist attribute file in L1 and L2 cache on A64FX.

| dist combination  | Pattern A   | Pattern B   |
|-------------------|-------------|-------------|
| L1:256,  L2:1024  | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024  | 279172.8742 | 118979.4542 |
| L1:256,  L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

For these reasons, we would like to add this interface to the
upstream kernel.

> I'm not sure about a wholly separate drivers/hwpf/ - it's not like there are
> gazillion different hw prefetch drivers.

We created a new directory to lump multiple separate files into one
place. We don't think this is a good way. If there is any other
suitable way, we would like to change it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ