[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11f659fd88f887b9fe4c88a386f1a5c2157968a6.camel@ibm.com>
Date: Tue, 10 Feb 2026 21:02:12 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "jack@...e.cz" <jack@...e.cz>
CC: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
"lsf-pc@...ts.linux-foundation.org"
<lsf-pc@...ts.linux-foundation.org>,
"chrisl@...nel.org" <chrisl@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Pavan Rallabhandi
<Pavan.Rallabhandi@....com>,
"clm@...a.com" <clm@...a.com>
Subject: RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in
Linux kernel
On Tue, 2026-02-10 at 14:47 +0100, Jan Kara wrote:
> On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote:
> > On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote:
> > > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko
> > > <Slava.Dubeyko@....com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Machine Learning (ML) is approach/area of learning from data,
> > > > finding patterns, and making predictions without implementing algorithms
> > > > by developers. The number of areas of ML applications is growing
> > > > with every day. Generally speaking, ML can introduce a self-evolving and
> > > > self-learning capability in Linux kernel. There are already research works
> > > > and industry efforts to employ ML approaches for configuration and
> > > > optimization the Linux kernel. However, introduction of ML approaches
> > > > in Linux kernel is not so simple and straightforward way. There are multiple
> > > > problems and unanswered questions on this road. First of all, any ML model
> > > > requires the floating-point operations (FPU) for running. But there is
> > > > no direct use of FPUs in kernel space. Also, ML model requires training phase
> > > > that can be a reason of significant performance degradation of Linux kernel.
> > > > Even inference phase could be problematic from the performance point of view
> > > > on kernel side. The using of ML approaches in Linux kernel is inevitable step.
> > > > But, how can we use ML approaches in Linux kernel? Which infrastructure
> > > > do we need to adopt ML models in Linux kernel?
> > >
> > > I think there are two different things, I think you want the latter
> > > but I am not sure
> > >
> > > 1) using ML model to help kernel development, code reviews, generate
> > > patches by descriptions etc. For example, Chris Mason has a kernel
> > > review repo on github and he is sharing his review finding the mailing
> > > list:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e=
> > > It is kernel development related, but the ML agent code is running in
> > > the user space. The actual ML computation might run GPU/TPUs. That
> > > does not seem to be what you have in mind.
> > >
> > > 2) Run the ML model computation in the kernel space.
> > > Can you clarify if this is what you have in mind? You mention kernel
> > > FPU usage in the kernel for ML model. It is only relevant if you need
> > > to run the FP in the kernel CPU instructions. Most ML computations are
> > > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the
> > > ML program (PyTorch/agents) in the user space and pass the data to the
> > > GPU/TPU driver to run? There will be some kernel instructure like
> > > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the
> > > kernel is just facilitating the data passing to/from the GPU/TPU
> > > driver then to the GPU/TPU hardware. The ML hardware is doing the
> > > heavy lifting.
> >
> > The idea is to have ML model running in user-space and kernel subsystem can
> > interact with ML model in user-space. As the next step, I am considering two
> > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON
> > approach. So, for example, GC can be represented by ML model in user-space. GC
> > can request data (segments state) from kernel-space and ML model in user-space
> > can do training or/and inference. As a result, ML model in user-space can select
> > victim segments and instruct kernel-space logic of moving valid data from victim
> > segment(s) into clean/current one(s).
>
> To be honest I'm skeptical about how generic this can be. Essentially
> you're describing a generic interface to offload arbitrary kernel decision
> to userspace. ML is a userspace bussiness here and not really relevant for
> the concept AFAICT. And we already have several ways of kernel asking
> userspace to do something for it and unless it is very restricted and well
> defined it is rather painful, prone to deadlocks, security issues etc.
Scepticism is normal reaction. :) So, nothing wrong is to be sceptical.
I believe it can be pretty generic from the data flow point of view. Probably,
different kernel subsystems could require different ways of interaction with
user-space. However, if we are talking about data flow but NOT execution flow,
then it could be generic enough. And if it can be generic, then we can suggest
generic way of extending any kernel subsystem by ML support.
I don't think that we need to consider the ML library appraoch like "kernel
asking userspace to do something". Rather it needs to consider the model like
"kernel share data with user-space and user-space recommends something to
kernel". So, user-space agent (ML model) can request data from kernel space or
kernel subsystem can notify the user-space agent that data is available. And
it's up to kernel subsystem implementation which data could be shared with user-
space. So, ML model can be trained in user-space and, then, share
recommendations (or eBPF code, for example) with kernel space. Finally, it's up
to kernel subsystem how and when to apply these recommendations on kernel side.
>
> So by all means if you want to do GC decisions for your filesystem in
> userspace by ML, be my guest, it does make some sense although I'd be wary
> of issues where we need to writeback dirty pages to free memory which may
> now depend on your userspace helper to make a decision which may need the
> memory to do the decision... But I don't see why you need all the ML fluff
> around it when it seems like just another way to call userspace helper and
> why some of the existing methods would not suffice.
>
OK. I see. :) You understood GC like a subsystem that helps to kernel memory
subsystem to manage the writeback dirty memory pages. :) It's potential
direction and I like your suggestion. :) But I meant something different because
I consider of LFS file system's GC subsystem. So, if we are using Copy-On-Write
(COW) policy, then we have segments or erase blocks with a mixture of valid and
invalid logical blocks after update operations. And we need GC subsystem to
clean old segments by means of moving valid logical blocks from exhausted
segments into clean/current ones. The problem here is to find an efficient
algorithm of selecting victim segments with smallest amount of valid blocks with
the goal of decreasing write amplification. So, file system needs to share the
metadata details (segments state, for example), ML model can share the
recommendations, and kernel code of file system can finally move valid blocks in
the background.
I don't want to say that ML is a miracle that can solve all our problems. And it
cannot work efficiently for all possible problems. But it can help us to solve
some complicated issues and it makes sense to elaborate some generic framework
for ML adoption into Linux kernel.
Thanks,
Slava.
Powered by blists - more mailing lists