lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEZ6=UOA6=ikSdxN662xyhT3wauGuqZReKLOb=_9EmSRckNr=Q@mail.gmail.com>
Date:   Wed, 12 Oct 2022 07:34:06 -0500
From:   Vinicius Petrucci <vpetrucci@...il.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Frank van der Linden <fvdl@...gle.com>,
        Zhongkun He <hezhongkun.hzk@...edance.com>, corbet@....net,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-doc@...r.kernel.org, wuyun.abel@...edance.com
Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy()

> Well, per address range operation is a completely different beast I
> would say. External tool would need to a) understand what that range is
> used for (e.g. stack/heap ranges, mmaped shared files like libraries or
> private mappings) and b) by in sync with memory layout modifications
> done by applications (e.g. that an mmap has been issued to back malloc
> request). Quite a lot of understanding about the specific process. I
> would say that with that intimate knowledge it is quite better to be
> part of the process and do those changes from within of the process
> itself.

Sorry, this may be a digression, but just wanted to mention a
particular use case from a project I recently collaborated on (to
appear next month at IIWSC 2022:
http://www.iiswc.org/iiswc2022/index.html).

We carried out a performance analysis of the latest Linux AutoNUMA
memory tiering on graph processing applications. We noticed that hot
pages cannot be properly identified by the reactive approach used by
AutoNUMA due to irregular/random memory access patterns. Thus, as a
POC, we implemented and evaluated a simple idea of having an external
user-level process/agent that, based on prior profiling results of
memory regions, could make more effectively memory chunk/object-based
mappings (instead of page-level allocation/migration) in advance on
either DRAM or CXL/PMEM (via mbind calls). This kind of tiering
solution could deliver up to 2x more performance for graph analytics
workloads. We plan to evaluate other workloads as well.

Having a feature like "pidfd/process_mbind" would really simplify our
user-level agent implementation moving forward, as right now we are
adding a LD_PRELOAD wrapper (for signal handler) to listen and execute
"mbind" requests from another process. If there's any other
alternative solution to this already (via ptrace?), please let me
know.

Thank you!

Vinicius Petrucci
Principal Performance Engineer
Micron Technology

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ