linux-kernel - Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c583be1b-17db-1ed3-0f5a-bd119edc8bfe@deltatee.com>
Date:   Thu, 6 Dec 2018 13:11:28 -0700
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Dave Hansen <dave.hansen@...el.com>,
        Jerome Glisse <jglisse@...hat.com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Keith Busch <keith.busch@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Haggai Eran <haggaie@...lanox.com>,
        Balbir Singh <bsingharora@...il.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Felix Kuehling <felix.kuehling@....com>,
        Philip Yang <Philip.Yang@....com>,
        Christian König <christian.koenig@....com>,
        Paul Blinzer <Paul.Blinzer@....com>,
        John Hubbard <jhubbard@...dia.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Michal Hocko <mhocko@...nel.org>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        Mark Hairgrove <mhairgrove@...dia.com>,
        Vivek Kini <vkini@...dia.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Dave Airlie <airlied@...hat.com>,
        Ben Skeggs <bskeggs@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Rik van Riel <riel@...riel.com>,
        Ben Woodard <woodard@...hat.com>, linux-acpi@...r.kernel.org
Subject: Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

On 2018-12-06 12:31 p.m., Dave Hansen wrote:
> On 12/6/18 11:20 AM, Jerome Glisse wrote:
>>>> For case 1 you can pre-parse stuff but this can be done by helper library
>>> How would that work?  Would each user/container/whatever do this once?
>>> Where would they keep the pre-parsed stuff?  How do they manage their
>>> cache if the topology changes?
>> Short answer i don't expect a cache, i expect that each program will have
>> a init function that query the topology and update the application codes
>> accordingly.
> 
> My concern with having folks do per-program parsing, *and* having a huge
> amount of data to parse makes it unusable.  The largest systems will
> literally have hundreds of thousands of objects in /sysfs, even in a
> single directory.  That makes readdir() basically impossible, and makes
> even open() (if you already know the path you want somehow) hard to do fast.

Is this actually realistic? I find it hard to imagine an actual hardware
bus that can have even thousands of devices under a single node, let
alone hundreds of thousands. At some point the laws of physics apply.
For example, in present hardware, the most ports a single PCI switch can
have these days is under one hundred. I'd imagine any such large systems
would have a hierarchy of devices (ie. layers of switch-like devices)
which implies the existing sysfs bus/devices  should have a path through
it without navigating a directory with that unreasonable a number of
objects in it. HMS, on the other hand, has all possible initiators
(,etc) under a single directory.

The caveat to this is, that to find an initial starting point in the bus
hierarchy you might have to go through /sys/dev/{block|char} or
/sys/class which may have directories with a large number of objects.
Though, such a system would necessarily have a similarly large number of
objects in /dev which means means you will probably never get around the
readdir/open bottleneck you mention... and, thus, this doesn't seem
overly realistic to me.

Logan