[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <47d21a6821c4b2d085f7b97bcdaa205bfcb0e0ad.camel@ibm.com>
Date: Fri, 6 Feb 2026 19:38:28 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "lsf-pc@...ts.linux-foundation.org" <lsf-pc@...ts.linux-foundation.org>
CC: Viacheslav Dubeyko <vdubeyko@...hat.com>,
"linux-mm@...ck.org"
<linux-mm@...ck.org>,
Pavan Rallabhandi <Pavan.Rallabhandi@....com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel
Hello,
Machine Learning (ML) is approach/area of learning from data,
finding patterns, and making predictions without implementing algorithms
by developers. The number of areas of ML applications is growing
with every day. Generally speaking, ML can introduce a self-evolving and
self-learning capability in Linux kernel. There are already research works
and industry efforts to employ ML approaches for configuration and
optimization the Linux kernel. However, introduction of ML approaches
in Linux kernel is not so simple and straightforward way. There are multiple
problems and unanswered questions on this road. First of all, any ML model
requires the floating-point operations (FPU) for running. But there is
no direct use of FPUs in kernel space. Also, ML model requires training phase
that can be a reason of significant performance degradation of Linux kernel.
Even inference phase could be problematic from the performance point of view
on kernel side. The using of ML approaches in Linux kernel is inevitable step.
But, how can we use ML approaches in Linux kernel? Which infrastructure
do we need to adopt ML models in Linux kernel?
What is the goal of using ML models in Linux kernel? The main goal is
to employ ML models for elaboration of a logic of particular Linux kernel
subsystem based on processing data or/and an efficient subsystem
configuration based on internal state of subsystem. As a result, it needs:
(1) collect data for training, (2) execute ML model training phase,
(3) test trained ML model, (4) use ML model for executing the inference phase.
The ML model inference can be used for recommendation of Linux kernel
subsystem configuration or/and for injecting a synthesized subsystem logic
into kernel space (for example, eBPF logic).
How ML infrastructure can be designed in Linux kernel? It needs to introduce
in Linux kernel a special ML library that can implement a generalized
interface of interaction between ML model’s thread in user-space and kernel
subsystem. Likewise interface requires to have the means:
(1) create/initialize/destroy ML model proxy in kernel subsystem,
(2) start/stop ML model proxy, (3) get/preprocess/publish data sets
from kernel space, (4) receive/preprocess/apply ML model recommendation(s)
from user-space, (5) execute synthesized logic/recommendations in kernel-space,
(6) estimate efficiency of synthesized logic/recommendations,
(7) execute error back-propagation with the goal of correction ML model
on user-space side.
The create and initialize logic can be executed by kernel subsystem during
module load or Linux kernel start (oppositely, module unload or kernel
shutdown will execute destroy of ML model proxy logic). ML model thread
in user-space will be capable to re-initialize and to execute
the start/stop logic of ML model proxy on kernel side. First of all,
ML model needs to be trained by data from kernel space. The data can be
requested by ML model from user-space or data can be published by ML model
proxy from kernel-space. The sysfs interface can be used to orchestrate
this interaction. As a result, ML model in user-space should be capable
to extract data set(s) from kernel space through sysfs, FUSE or character
device. Extracted data can be stored in persistent storage and, finally,
ML model can be trained in user-space by accessing these data.
The continuous learning model can be adopted during training phase.
It implies that kernel subsystem can receive ML model recommendations
even during training phase. ML model proxy on kernel side can estimate
the current kernel subsystem state, tries to apply the ML model
recommendations, and estimate the efficiency of applied recommendations.
Generally speaking, ML model proxy on kernel side can consider several
modes of interaction with ML model recommendations: (1) emergency mode,
(2) learning mode, (3) collaboration mode, (4) recommendation mode.
The emergency mode is the mode when kernel subsystem is in critical state
and it is required to work as efficient as possible without capability of
involving the ML model recommendations (for example, ML model
recommendations are completely inadequate or load is very high).
The learning mode implies that kernel subsystem can try to apply
the ML model recommendations for some operations with the goal of
estimation the maturity of ML model. Also, ML model proxy can degrade
the mode to learning state if ML model recommendations becomes inefficient.
The collaboration mode has the goal of using ML recommendations in
50% of operations with the goal of achieving mature state of ML model.
And, finally, ML model proxy can convert kernel subsystem in recommendation
mode if ML model is mature enough and efficiency of applying
the ML recommendations is higher than using human-made algorithms.
The back-propagation approach can be used to correct the ML model
by means of sharing feedback of efficiency estimation from kernel
to user-space side.
I would like to discuss the approach of ML library in Linux kernel
that can provide the way to run/employ ML models in Linux kernel.
Thanks,
Slava.
[REFERENCES]
[1]
https://lore.kernel.org/linux-fsdevel/20240605110219.7356-1-slava@dubeyko.com/
[2] https://www.youtube.com/watch?v=E7q0SKeniXU
[3] https://github.com/kernel-ml-lib/ml-lib
[4] https://github.com/kernel-ml-lib/ml-lib-linux
[5]
https://lore.kernel.org/linux-fsdevel/20260206191136.2609767-1-slava@dubeyko.com/T/#t
Powered by blists - more mailing lists