[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221003222133.20948-11-aliraza@bu.edu>
Date: Mon, 3 Oct 2022 18:21:33 -0400
From: Ali Raza <aliraza@...edu>
To: linux-kernel@...r.kernel.org
Cc: corbet@....net, masahiroy@...nel.org, michal.lkml@...kovi.net,
ndesaulniers@...gle.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
luto@...nel.org, ebiederm@...ssion.com, keescook@...omium.org,
peterz@...radead.org, viro@...iv.linux.org.uk, arnd@...db.de,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
pbonzini@...hat.com, jpoimboe@...nel.org,
linux-doc@...r.kernel.org, linux-kbuild@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-arch@...r.kernel.org, x86@...nel.org, rjones@...hat.com,
munsoner@...edu, tommyu@...edu, drepper@...hat.com,
lwoodman@...hat.com, mboydmcse@...il.com, okrieg@...edu,
rmancuso@...edu, Ali Raza <aliraza@...edu>
Subject: [RFC UKL 10/10] Kconfig: Add config option for enabling and sample for testing UKL
Add the KConfig file that will enable building UKL. Documentation
introduces the technical details for how UKL works and the motivations
behind why it is useful. Sample provides a simple program that still uses
the standard system call interface, but does not require a modified C
library.
Cc: Jonathan Corbet <corbet@....net>
Cc: Masahiro Yamada <masahiroy@...nel.org>
Cc: Michal Marek <michal.lkml@...kovi.net>
Cc: Nick Desaulniers <ndesaulniers@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Eric Biederman <ebiederm@...ssion.com>
Cc: Kees Cook <keescook@...omium.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Alexander Viro <viro@...iv.linux.org.uk>
Cc: Arnd Bergmann <arnd@...db.de>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>
Co-developed-by: Eric B Munson <munsoner@...edu>
Signed-off-by: Eric B Munson <munsoner@...edu>
Co-developed-by: Ali Raza <aliraza@...edu>
Signed-off-by: Ali Raza <aliraza@...edu>
---
Documentation/index.rst | 1 +
Documentation/ukl/ukl.rst | 104 ++++++++++++++++++++++++++++++++++++++
Kconfig | 2 +
kernel/Kconfig.ukl | 41 +++++++++++++++
samples/ukl/Makefile | 16 ++++++
samples/ukl/README | 17 +++++++
samples/ukl/syscall.S | 28 ++++++++++
samples/ukl/tcp_server.c | 99 ++++++++++++++++++++++++++++++++++++
8 files changed, 308 insertions(+)
create mode 100644 Documentation/ukl/ukl.rst
create mode 100644 kernel/Kconfig.ukl
create mode 100644 samples/ukl/Makefile
create mode 100644 samples/ukl/README
create mode 100644 samples/ukl/syscall.S
create mode 100644 samples/ukl/tcp_server.c
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 4737c18c97ff..42f8cb7d4cae 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -167,6 +167,7 @@ to ReStructured Text format, or are simply too old.
tools/index
staging/index
+ ukl/ukl.rst
Translations
diff --git a/Documentation/ukl/ukl.rst b/Documentation/ukl/ukl.rst
new file mode 100644
index 000000000000..a07ebb51169e
--- /dev/null
+++ b/Documentation/ukl/ukl.rst
@@ -0,0 +1,104 @@
+SPDX-License-Identifier: GPL-2.0
+
+Unikernel Linux (UKL)
+=====================
+
+Unikernel Linux (UKL) is a research project aimed at integrating
+application specific optimizations to the Linux kernel. This RFC aims to
+introduce this research to the community. Any feedback regarding the idea,
+goals, implementation and research is highly appreciated.
+
+Unikernels are specialized operating systems where an application is linked
+directly with the kernel and runs in supervisor mode. This allows the
+developers to implement application specific optimizations to the kernel,
+which can be directly invoked by the application (without going through the
+syscall path). An application can control scheduling and resource
+management and directly access the hardware. Application and the kernel can
+be co-optimized, e.g., through LTO, PGO, etc. All of these optimizations,
+and others, provide applications with huge performance benefits over
+general purpose operating systems.
+
+Linux is the de-facto operating system of today. Applications depend on its
+battle tested code base, large developer community, support for legacy
+code, a huge ecosystem of tools and utilities, and a wide range of
+compatible hardware and device drivers. Linux also allows some degree of
+application specific optimizations through build time config options,
+runtime configuration, and recently through eBPF. But still, there is a
+need for even more fine-grained application specific optimizations, and
+some developers resort to kernel bypass techniques.
+
+Unikernel Linux (UKL) aims to get the best of both worlds by bringing
+application specific optimizations to the Linux ecosystem. This way,
+unmodified applications can keep getting the benefits of Linux while taking
+advantage of the unikernel-style optimizations. Optionally, applications
+can be modified to invoke deeper optimizations.
+
+There are two steps to unikernel-izing Linux, i.e., first, equip Linux with
+a unikernel model, and second, actually use that model to implement
+application specific optimizations. This patch focuses on the first part.
+Through this patch, unmodified applications can be built as Linux
+unikernels, albeit with only modest performance advantages. Like
+unikernels, UKL would allow an application to be statically linked into the
+kernel and executed in supervisor mode. However, UKL preserves most of the
+invariants and design of Linux, including a separate page-able application
+portion of the address space and a pinned kernel portion, the ability to
+run multiple processes, and distinct execution modes for application and
+kernel code. Kernel execution mode and application execution mode are
+different, e.g., the application execution mode allows application threads
+to be scheduled, handle signals, etc., which do not apply to kernel
+threads. Application built as a Linux unikernel will have its text and data
+loaded with the kernel at boot time, while the rest of the address space
+would remain unchanged. These applications invoke the system call
+functionality through a function call into the kernel system call entry
+point instead of through the syscall assembly instruction. UKL would
+support a normal userspace so the UKL application can be started, managed,
+profiled, etc., using normal command line utilities.
+
+Once Linux has a unikernel model, different application specific
+optimizations are possible. We have tried a few, e.g., fast system call
+transitions, shared stacks to allow LTO, invoking kernel functions
+directly, etc. We have seen huge performance benefits, details of which are
+not relevant to this patch and can be found in our paper.
+(https://arxiv.org/pdf/2206.00789.pdf)
+
+UKL differs significantly from previous projects, e.g., UML, KML and LKL.
+User Mode Linux (UML) is a virtual machine monitor implemented on syscall
+interface, a very different goal from UKL. Kernel Mode Linux (KML) allows
+applications to run in kernel mode and replaces syscalls with function
+calls. While KML stops there, UKL goes further. UKL links applications and
+kernel together which allows further optimizations e.g., fast system call
+transitions, shared stacks to allow LTO, invoking kernel functions directly
+etc. Details can be found in the paper linked above. Linux Kernel Library
+(LKL) harvests arch independent code from Linux, takes it to userspace as a
+library to be linked with applications. A host needs to provide arch
+dependent functionality. This model is very different from UKL. A detailed
+discussion of related work is present in the paper linked above.
+
+See samples/ukl for a simple TCP echo server example which can be built as
+a normal user space application and also as a UKL application. In the Linux
+config options, a path to the compiled and partially linked application
+binary can be specified. Kernel built with UKL enabled will search this
+location for the binary and link with the kernel. Applications and required
+libraries need to be compiled with -mno-red-zone -mcmodel=kernel flags
+because kernel mode execution can trample on application red zones and in
+order to link with the kernel and be loaded in the high end of the address
+space, application should have the correct memory model. Examples of other
+applications like Redis, Memcached etc along with glibc and libgcc etc.,
+can be found at https://github.com/unikernelLinux/ukl
+
+List of authors and contributors:
+=================================
+
+Ali Raza - aliraza@...edu
+Thomas Unger - tommyu@...edu
+Matthew Boyd - mboydmcse@...il.com
+Eric Munson - munsoner@...edu
+Parul Sohal - psohal@...edu
+Ulrich Drepper - drepper@...hat.com
+Richard Jones - rjones@...hat.com
+Daniel Bristot de Oliveira - bristot@...nel.org
+Larry Woodman - lwoodman@...hat.com
+Renato Mancuso - rmancuso@...edu
+Jonathan Appavoo - jappavoo@...edu
+Orran Krieger - okrieg@...edu
+
diff --git a/Kconfig b/Kconfig
index 745bc773f567..2a4594ae472c 100644
--- a/Kconfig
+++ b/Kconfig
@@ -29,4 +29,6 @@ source "lib/Kconfig"
source "lib/Kconfig.debug"
+source "kernel/Kconfig.ukl"
+
source "Documentation/Kconfig"
diff --git a/kernel/Kconfig.ukl b/kernel/Kconfig.ukl
new file mode 100644
index 000000000000..c2c5e1003605
--- /dev/null
+++ b/kernel/Kconfig.ukl
@@ -0,0 +1,41 @@
+menuconfig UNIKERNEL_LINUX
+ bool "Unikernel Linux"
+ depends on X86_64 && !RANDOMIZE_BASE && !PAGE_TABLE_ISOLATION
+ help
+ Unikernel Linux allows for a single, privileged process to be
+ linked with the kernel binary and be executed inplace of or
+ along side a more traditional user space.
+
+ If you don't know what this is, say N.
+
+config UKL_TLS
+ bool "Enable TLS for UKL application"
+ depends on UNIKERNEL_LINUX
+ default Y
+ help
+ Not all applications will make use of thread local storage,
+ but we need to account for it in the linker script if used.
+ For the application in samples/ this should be disabled, but
+ if you are working with glibc this should be 'Y'.
+
+ If unsure say 'Y' here
+
+config UKL_NAME
+ string "UKL Exec target"
+ depends on UNIKERNEL_LINUX
+ default "/UKL"
+ help
+ We need a way to trigger the start of the UKL application,
+ either by the kernel inplace of init or userspace when setup
+ is finished. The value given here is compared against the
+ filename passed to exec and if they match UKL is started.
+ For a more 'traditional' unikernel model, the value set here
+ should be given to the init= boot parameter.
+
+config UKL_ARCHIVE_PATH
+ string "Path static application archive"
+ depends on UNIKERNEL_LINUX
+ default "../UKL.a"
+ help
+ Where the linker should look for the statically linked application
+ and dependency archive.
diff --git a/samples/ukl/Makefile b/samples/ukl/Makefile
new file mode 100644
index 000000000000..93beb7750d4b
--- /dev/null
+++ b/samples/ukl/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -I usr/include -fno-PIC -mno-red-zone -mcmodel=kernel
+
+UKL.a: tcp_server.o syscall.o userspace
+ $(AR) cr UKL.a tcp_server.o syscall.o
+ objcopy --prefix-symbols=ukl_ UKL.a
+
+tcp_server.o: tcp_server.c
+syscall.o: syscall.S
+
+userspace:
+ gcc -o tcp_server tcp_server.c
+
+clean:
+ rm -f UKL.a tcp_server.o syscall.o tcp_server
diff --git a/samples/ukl/README b/samples/ukl/README
new file mode 100644
index 000000000000..fbb771da033a
--- /dev/null
+++ b/samples/ukl/README
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+UKL test program
+================
+
+tcp_server.c is a epoll based TCP echo server written in C which uses port
+no. 5555 by default. syscall.S translates syscall() function to a call
+instruction in assembly. Normally, C libraries provide syscall() function
+that translate into syscall assembly instruction. Run `make` and it will
+create a UKL.a and a tcp_server. UKL.a can then be copied to where UKL
+Linux build expects it to be present. This can be changed through the Linux
+config options (by running `make menuconfig` etc.) The resulting Linux
+kernel can be run, and once the userspace comes up, the echo server can be
+started by running the UKL exec command, again chosen through the Linux
+config options. tcp_server is a userspace binary of the same echo server
+which can be run normally. This is meant to show that UKL can run code
+which can also be run as a userspace binary without modification.
diff --git a/samples/ukl/syscall.S b/samples/ukl/syscall.S
new file mode 100644
index 000000000000..95d1c177fb05
--- /dev/null
+++ b/samples/ukl/syscall.S
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+ .global _start
+_start:
+ jmp main
+
+ .global syscall
+
+/* Usage: long syscall (syscall_number, arg1, arg2, arg3, arg4, arg5, arg6)
+ We need to do some arg shifting, the syscall_number will be in
+ rax. */
+
+ .text
+syscall:
+ movq %rdi, %rax /* Syscall number -> rax. */
+ movq %rsi, %rdi /* shift arg1 - arg5. */
+ movq %rdx, %rsi
+ movq %rcx, %rdx
+ movq %r8, %r10
+ movq %r9, %r8
+ movq 8(%rsp),%r9 /* arg6 is on the stack. */
+ call entry_SYSCALL_64 /* Do the system call. */
+ cmpq $-4095, %rax /* Check %rax for error. */
+ jae loop /* Jump to error handler if error. */
+ ret /* Return to caller. */
+
+loop:
+ jmp loop
diff --git a/samples/ukl/tcp_server.c b/samples/ukl/tcp_server.c
new file mode 100644
index 000000000000..abf1a0e2bb79
--- /dev/null
+++ b/samples/ukl/tcp_server.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/epoll.h>
+#include <arpa/inet.h>
+#include <netinet/tcp.h>
+
+#define BACKLOG 512
+#define MAX_EVENTS 128
+#define MAX_MESSAGE_LEN 2048
+
+void error(char *msg);
+extern long syscall(long number, ...);
+
+int main(void)
+{
+ // some variables we need
+ struct sockaddr_in server_addr, client_addr;
+ socklen_t client_len = sizeof(client_addr);
+ int bytes_received;
+ char buffer[MAX_MESSAGE_LEN];
+ int on;
+ int result;
+ int sock_listen_fd, newsockfd;
+
+ // setup socket
+ sock_listen_fd = syscall(41, AF_INET, SOCK_STREAM, 0);
+ if (sock_listen_fd < 0)
+ error("Error creating socket..\n");
+
+ server_addr.sin_family = AF_INET;
+ server_addr.sin_port = 45845; //htons(portno);
+ server_addr.sin_addr.s_addr = INADDR_ANY;
+
+ // set TCP NODELAY
+ on = 1;
+ result = syscall(54, sock_listen_fd, IPPROTO_TCP, TCP_NODELAY, &on, sizeof(on));
+ if (result < 0)
+ error("Can't set TCP_NODELAY to on");
+
+ // bind socket and listen for connections
+ if (syscall(49, sock_listen_fd, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0)
+ error("Error binding socket..\n");
+
+ if (syscall(50, sock_listen_fd, BACKLOG) < 0)
+ error("Error listening..\n");
+
+ struct epoll_event ev, events[MAX_EVENTS];
+ int new_events, sock_conn_fd, epollfd;
+
+ epollfd = syscall(213, MAX_EVENTS);
+ if (epollfd < 0)
+ error("Error creating epoll..\n");
+
+ ev.events = EPOLLIN;
+ ev.data.fd = sock_listen_fd;
+
+ if (syscall(233, epollfd, EPOLL_CTL_ADD, sock_listen_fd, &ev) == -1)
+ error("Error adding new listeding socket to epoll..\n");
+
+ while (1) {
+ new_events = syscall(232, epollfd, events, MAX_EVENTS, -1);
+
+ if (new_events == -1)
+ error("Error in epoll_wait..\n");
+
+ for (int i = 0; i < new_events; ++i) {
+ if (events[i].data.fd == sock_listen_fd) {
+ sock_conn_fd = syscall(288, sock_listen_fd,
+ (struct sockaddr *)&client_addr,
+ &client_len, SOCK_NONBLOCK);
+ if (sock_conn_fd == -1)
+ error("Error accepting new connection..\n");
+
+ ev.events = EPOLLIN | EPOLLET;
+ ev.data.fd = sock_conn_fd;
+ if (syscall(233, epollfd, EPOLL_CTL_ADD, sock_conn_fd, &ev) == -1)
+ error("Error adding new event to epoll..\n");
+ } else {
+ newsockfd = events[i].data.fd;
+ bytes_received = syscall(45, newsockfd, buffer, MAX_MESSAGE_LEN,
+ 0, NULL, NULL);
+ if (bytes_received <= 0) {
+ syscall(233, epollfd, EPOLL_CTL_DEL, newsockfd, NULL);
+ syscall(48, newsockfd, SHUT_RDWR);
+ } else {
+ syscall(44, newsockfd, buffer, bytes_received, 0, NULL, 0);
+ }
+ }
+ }
+ }
+}
+
+void error(char *msg)
+{
+ syscall(1, 1, msg, 15);
+ syscall(60, 1);
+}
--
2.21.3
Powered by blists - more mailing lists