linux-kernel - Re: [PATCH] mm: don't call should_failslab() for !CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3j5d3p22ssv7xoaghzraa7crcfih3h2qqjlhmjppbp6f42pg2t@kg7qoicog5ye>
Date: Mon, 27 May 2024 11:34:36 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Jens Axboe <axboe@...nel.dk>, LKML <linux-kernel@...r.kernel.org>, 
	Linux Memory Management List <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <kafai@...com>, 
	Song Liu <songliubraving@...com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Christoph Lameter <cl@...ux.com>, David Rientjes <rientjes@...gle.com>, 
	Pekka Enberg <penberg@...nel.org>, Joonsoo Kim <iamjoonsoo.kim@....com>, Yonghong Song <yhs@...com>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Howard McLauchlan <hmclauchlan@...com>, bpf@...r.kernel.org, torvalds@...ux-foundation.org
Subject: Re: [PATCH] mm: don't call should_failslab() for !CONFIG_FAILSLAB

+cc Linus

On Thu, Oct 07, 2021 at 05:32:52PM +0200, Vlastimil Babka wrote:
> On 10/5/21 17:31, Jens Axboe wrote:
> > Allocations can be a very hot path, and this out-of-line function
> > call is noticeable.
> > 
> > Signed-off-by: Jens Axboe <axboe@...nel.dk>
> 
> It used to be inline b4 (hi, Konstantin!) and then was converted to be like
> this intentionally :/
> 
> See 4f6923fbb352 ("mm: make should_failslab always available for fault
> injection")
> 
> And now also kernel/bpf/verifier.c contains:
> BTF_ID(func, should_failslab)
> 
> I think either your or Andrew's version will break this BTF_ID thing, at the
> very least.
> 
> But I do strongly agree that putting unconditionally a non-inline call into
> slab allocator fastpath sucks. Can we make it so that bpf can only do these
> overrides when CONFIG_FAILSLAB is enabled?
> I don't know, perhaps putting this BTF_ID() in #ifdef as well, or providing
> a dummy that is always available (so that nothing breaks), but doesn't
> actually affect slab_pre_alloc_hook() unless CONFIG_FAILSLAB has been enabled?
> 

I just ran into it while looking at kmalloc + kfree pair.

A toy test which calls this in a loop like so:
static long noinline custom_bench(void)
{
        void *buf;

        while (!signal_pending(current)) {
                buf = kmalloc(16, GFP_KERNEL);
                kfree(buf);
                cond_resched();
        }

        return -EINTR;
}

.. shows this with perf top:
   57.88%  [kernel]           [k] kfree
   31.38%  [kernel]           [k] kmalloc_trace_noprof
    3.20%  [kernel]           [k] should_failslab.constprop.0

A side note is that I verified majority of the time in kfree and
kmalloc_trace_noprof is cmpxchg16b, which is both good and bad news.

As for should_failslab, it compiles to an empty func on production
kernels and is present even when there are no supported means of
instrumenting it. As in everyone pays for its existence, even if there
is no way to use it.

Also note there are 3 unrelated mechanisms to alter the return code,
which imo is 2 too many. But more importantly they are not even
coordinated.

A hard requirement for a long term solution is to not alter the fast
path beyond nops for hot patching.

So far I think implementing this in a clean manner would require
agreeing on some namespace for bpf ("failprobes"?) and coordinating
hotpatching between different mechanisms. Maybe there is a better, I
don't know.

Here is the crux of my e-mail though:
1. turning should_failslab into a mandatory func call is an ok local
   hack for the test farm, not a viable approach for production
2. as such it is up to the original submitter (or whoever else
   who wants to pick up the slack) to implement something which
   hotpatches the callsite as opposed to inducing a function call for
   everyone

In the meantime the routine should disappear unless explicitly included
in kernel config. The patch submitted here would be one way to do it.