linux-kernel - RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c24a209d5a4af0c4cc08f30098998ce16c668b58.camel@ibm.com>
Date: Tue, 10 Feb 2026 22:36:35 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "jack@...e.cz" <jack@...e.cz>, "clm@...a.com" <clm@...a.com>
CC: "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "linux-mm@...ck.org"
	<linux-mm@...ck.org>,
        "chrisl@...nel.org" <chrisl@...nel.org>,
        Pavan
 Rallabhandi <Pavan.Rallabhandi@....com>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>,
        "lsf-pc@...ts.linux-foundation.org"
	<lsf-pc@...ts.linux-foundation.org>
Subject: RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in
 Linux kernel

On Tue, 2026-02-10 at 09:20 -0500, Chris Mason wrote:
> On 2/10/26 8:47 AM, Jan Kara wrote:
> > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote:
> > > On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote:
> > > > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko
> > > > <Slava.Dubeyko@....com> wrote:
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > Machine Learning (ML) is approach/area of learning from data,
> > > > > finding patterns, and making predictions without implementing algorithms
> > > > > by developers. The number of areas of ML applications is growing
> > > > > with every day. Generally speaking, ML can introduce a self-evolving and
> > > > > self-learning capability in Linux kernel. There are already research works
> > > > > and industry efforts to employ ML approaches for configuration and
> > > > > optimization the Linux kernel. However, introduction of ML approaches
> > > > > in Linux kernel is not so simple and straightforward way. There are multiple
> > > > > problems and unanswered questions on this road. First of all, any ML model
> > > > > requires the floating-point operations (FPU) for running. But there is
> > > > > no direct use of FPUs in kernel space. Also, ML model requires training phase
> > > > > that can be a reason of significant performance degradation of Linux kernel.
> > > > > Even inference phase could be problematic from the performance point of view
> > > > > on kernel side. The using of ML approaches in Linux kernel is inevitable step.
> > > > > But, how can we use ML approaches in Linux kernel? Which infrastructure
> > > > > do we need to adopt ML models in Linux kernel?
> > > > 
> > > > I think there are two different things, I think you want the latter
> > > > but I am not sure
> > > > 
> > > > 1) using ML model to help kernel development, code reviews, generate
> > > > patches by descriptions etc. For example, Chris Mason has a kernel
> > > > review repo on github and he is sharing his review finding the mailing
> > > > list:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e=  
> > > > It is kernel development related, but the ML agent code is running in
> > > > the user space. The actual ML computation might run GPU/TPUs. That
> > > > does not seem to be what you have in mind.
> > > > 
> > > > 2) Run the ML model computation in the kernel space.
> > > > Can you clarify if this is what you have in mind? You mention kernel
> > > > FPU usage in the kernel for ML model. It is only relevant if you need
> > > > to run the FP in the kernel CPU instructions. Most ML computations are
> > > > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the
> > > > ML program (PyTorch/agents) in the user space and pass the data to the
> > > > GPU/TPU driver to run? There will be some kernel instructure like
> > > > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the
> > > > kernel is just facilitating the data passing to/from the GPU/TPU
> > > > driver then to the GPU/TPU hardware. The ML hardware is doing the
> > > > heavy lifting.
> > > 
> > > The idea is to have ML model running in user-space and kernel subsystem can
> > > interact with ML model in user-space. As the next step, I am considering two
> > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON
> > > approach. So, for example, GC can be represented by ML model in user-space. GC
> > > can request data (segments state) from kernel-space and ML model in user-space
> > > can do training or/and inference. As a result, ML model in user-space can select
> > > victim segments and instruct kernel-space logic of moving valid data from victim
> > > segment(s) into clean/current one(s). 
> > 
> > To be honest I'm skeptical about how generic this can be. Essentially
> > you're describing a generic interface to offload arbitrary kernel decision
> > to userspace. ML is a userspace bussiness here and not really relevant for
> > the concept AFAICT. And we already have several ways of kernel asking
> > userspace to do something for it and unless it is very restricted and well
> > defined it is rather painful, prone to deadlocks, security issues etc.
> > 
> > So by all means if you want to do GC decisions for your filesystem in
> > userspace by ML, be my guest, it does make some sense although I'd be wary
> > of issues where we need to writeback dirty pages to free memory which may
> > now depend on your userspace helper to make a decision which may need the
> > memory to do the decision... But I don't see why you need all the ML fluff
> > around it when it seems like just another way to call userspace helper and
> > why some of the existing methods would not suffice.
> 
> Looking through the description (not the code, apologies), it really
> feels like we're reinventing BPF here:
> 
> - introspection into what the kernel is currently doing
> - communications channel with applications
> - a mechanism to override specific kernel functionality
> - fancy applications arbitrating decisions.
> 
> My feedback during plumbers and also today is that you can get 99% of
> what you're looking for with some BPF code.

I see your point. And I can agree with you that eBPF could be used as a
communication channel. I don't try to invent a new communication channel. My
point here that ML library should be the unified means of extending kernel
subsystem by ML model(s) in user-space. So, eBPF could be the one of (or, maybe,
only one) possible communication mechanism. ML library should provide the
unified framework and workflow for easy adding and using ML model(s) in user-
space by kernel subsystems.

> 
> It may or may not be perfect for your needs, but it's a much faster path
> to generate community and collaboration around the goals.  After that,
> it's a lot easier to justify larger changes in the kernel.
> 

Yeah, makes sense. My current patchset is exploring the API that ML library
should provide. And eBPF could be communication channel between ML model in
user-space and kernel subsystem.

> If this becomes an LSF/MM topic, my bar for discussion would be:
> - extensive data collected about some kernel component (Damon,
> scheduling etc)

Exactly, ML-based DAMON approach by using ML library is my next
implementation/exploring step.

> - working proof of concept that improved on decisions made in the kernel

Also, I am considering GC of LFS file system like low-hanging fruit for checking
the ML library approach. Especially, because, for example, NILFS2 has GC as
user-space process and it requires elaboration of efficient GC policy. So, it
could be potential proof of concept for the whole idea. Ideally, several use-
cases should benefit from the idea.

> - discussion of changes needed to improve or enable the proof of concept

Makes sense. This is why I've shared the patchset with initial vision of ML
library API. The goal is to hear all possible critics and to check the
capability of idea (and me) to survive. :)  

> 
> In other words, I don't think we need a list of ways ML might be used.
> I think we need specific examples of a way that ML was used and why it's
> better than what the kernel is already doing.
> 

Yes, as the next step, I am going to explore: (1) GC of LFS file system use-
case, (2) ML-based DAMON approach. I hope to have enough time enough time to
implement it before May and to share some numbers/results.

Thanks,
Slava.