lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <c85541c4-6ae5-8898-8044-99e403335f48@torproject.org>
Date:   Wed, 17 Mar 2021 12:57:58 -0500
From:   Jim Newsome <jnewsome@...project.org>
To:     Peter Oskolkov <posk@...gle.com>
Cc:     linux-kernel@...r.kernel.org,
        Rob Jansen <rob.g.jansen@....navy.mil>,
        Ryan Wails <ryan.wails@....navy.mil>
Subject: Re: [RFC PATCH 0/3 v3] futex/sched: introduce FUTEX_SWAP operation

I'm not well versed in this part of the kernel (ok, any part, really),
but I wanted to chime in from a user perspective that I'm very
interested in this functionality.

We (Rob + Ryan + I, cc'd) are currently developing the second generation
of the Shadow simulator <https://shadow.github.io/>, which is used by
various researchers and the Tor Project. In this new architecture,
simulated network-application processes (such as tor, browsers, and web
servers) are each run as a native OS process, started by forking and
exec'ing its unmodified binary. We are interested in supporting large
simulations (e.g. 50k+ processes), and expect them to take on the order
of hours or even days to execute, so scalability and performance matters.

We've prototyped two mechanisms for controlling these simulated
processes, and a third hybrid mechanism that combines the two. I've
mentioned one of these (ptrace) in another thread ("do_wait: make
PIDTYPE_PID case O(1) instead of O(n)"). The other mechanism is to use
an LD_PRELOAD'd shim that implements the libc interface, and
communicates with Shadow via a syscall-like API over IPC.

So far the most performant version we've tried of this IPC is with a bit
of shared memory and a pair of semaphores. It looks much like the
example in Peter's proposal:

> a. T1: futex-wake T2, futex-wait
> b. T2: wakes, does what it has been woken to do
> c. T2: futex-wake T1, futex-wait

We've been able to get the switching costs down using CPU pinning and
SCHED_FIFO. Each physical CPU spends most of its time swapping back and
forth between a Shadow worker thread and an emulated process. Even so,
the new architecture is so far slower than the first generation of
Shadow, which multiplexes the simulated processes into its own handful
of OS processes (but is complex and fragile).

> With FUTEX_SWAP, steps a and c above can be reduced to one futex
> operation that runs 5-10 times faster.

IIUC the proposed primitives could let us further improve performance,
and perhaps drop some of the complexity of attempting to control the
scheduler via pinning and SCHED_FIFO.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ