lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4j9shdJFrvADa=qW4L-jPJJ4S_TJc_c=aRoW3EmSCCChQ@mail.gmail.com>
Date:   Fri, 22 Dec 2017 14:53:42 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Brice Goglin <brice.goglin@...il.com>
Cc:     Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Michal Hocko <mhocko@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Anaczkowski, Lukasz" <lukasz.anaczkowski@...el.com>,
        "Box, David E" <david.e.box@...el.com>,
        "Kogut, Jaroslaw" <Jaroslaw.Kogut@...el.com>,
        "Koss, Marcin" <marcin.koss@...el.com>,
        "Koziej, Artur" <artur.koziej@...el.com>,
        "Lahtinen, Joonas" <joonas.lahtinen@...el.com>,
        "Moore, Robert" <robert.moore@...el.com>,
        "Nachimuthu, Murugasamy" <murugasamy.nachimuthu@...el.com>,
        "Odzioba, Lukasz" <lukasz.odzioba@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        "Schmauss, Erik" <erik.schmauss@...el.com>,
        "Verma, Vishal L" <vishal.l.verma@...el.com>,
        "Zheng, Lv" <lv.zheng@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Balbir Singh <bsingharora@...il.com>,
        Jerome Glisse <jglisse@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        Len Brown <lenb@...nel.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, devel@...ica.org,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        Linux API <linux-api@...r.kernel.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin <brice.goglin@...il.com> wrote:
> Le 20/12/2017 à 23:41, Ross Zwisler a écrit :
[..]
> Hello
>
> I can confirm that HPC runtimes are going to use these patches (at least
> all runtimes that use hwloc for topology discovery, but that's the vast
> majority of HPC anyway).
>
> We really didn't like KNL exposing a hacky SLIT table [1]. We had to
> explicitly detect that specific crazy table to find out which NUMA nodes
> were local to which cores, and to find out which NUMA nodes were
> HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the
> application because the reported latencies didn't match reality. Quite
> annoying.
>
> With Ross' patches, we can easily get what we need:
> * which NUMA nodes are local to which CPUs? /sys/devices/system/node/
> can only report a single local node per CPU (doesn't work for KNL and
> upcoming architectures with HBM+DDR+...)
> * which NUMA nodes are slow/fast (for both bandwidth and latency)
> And we can still look at SLIT under /sys/devices/system/node if really
> needed.
>
> And of course having this in sysfs is much better than parsing ACPI
> tables that are only accessible to root :)

On this point, it's not clear to me that we should allow these sysfs
entries to be world readable. Given /proc/iomem now hides physical
address information from non-root we at least need to be careful not
to undo that with new sysfs HMAT attributes. Once you need to be root
for this info, is parsing binary HMAT vs sysfs a blocker for the HPC
use case?

Perhaps we can enlist /proc/iomem or a similar enumeration interface
to tell userspace the NUMA node and whether the kernel thinks it has
better or worse performance characteristics relative to base
system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start
publishing absolute numbers in sysfs userspace will default to looking
for specific magic numbers in sysfs vs asking the kernel for memory
that has performance characteristics relative to base "System RAM". In
other words the absolute performance information that the HMAT
publishes is useful to the kernel, but it's not clear that userspace
needs that vs a relative indicator for making NUMA node preference
decisions.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ