[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFPr4+vfqufWiscRXqSRAuZM=S8H7QsZbiLrG+s1OWm1w@mail.gmail.com>
Date: Sat, 4 Mar 2023 21:31:43 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Alexander Potapenko <glider@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>,
Kees Cook <keescook@...omium.org>,
Eric Biggers <ebiggers@...gle.com>,
Christian Brauner <brauner@...nel.org>, serge@...lyn.com,
paul@...l-moore.com, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-security-module@...r.kernel.org
Subject: Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
On 3/4/23, Mateusz Guzik <mjguzik@...il.com> wrote:
> On 3/3/23, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>> On Fri, Mar 3, 2023 at 12:39 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>>>
>>> I think there is a systemic problem which comes with the kzalloc API
>>
>> Well, it's not necessarily the API that is bad, but the implementation.
>>
>> We could easily make kzalloc() with a constant size just expand to
>> kmalloc+memset, and get the behavior you want.
>>
>> We already do magical things for "find the right slab bucket" part of
>> kmalloc too for constant sizes. It's changed over the years, but that
>> policy goes back a long long time. See
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=95203fe78007f9ab3aebb96606473ae18c00a5a8
>>
>> from the BK history tree.
>>
>> Exactly because some things are worth optimizing for when the size is
>> known at compile time.
>>
>> Maybe just extending kzalloc() similarly? Trivial and entirely untested
>> patch:
>>
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -717,6 +717,12 @@ static inline void *kmem_cache_zalloc(struct
>> kmem_cache *k, gfp_t flags)
>> */
>> static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
>> {
>> + if (__builtin_constant_p(size)) {
>> + void *ret = kmalloc(size, flags);
>> + if (ret)
>> + memset(ret, 0, size);
>> + return ret;
>> + }
>> return kmalloc(size, flags | __GFP_ZERO);
>> }
>>
>
> So I played with this and have a rather nasty summary. Bullet points:
> 1. patched kzalloc does not reduce memsets calls during kernel build
> 2. patched kmem_cache_zalloc_ptr + 2 consumers converted *does* drop
> it significantly (36150671 -> 14414454)
> 3. ... inline memset generated by gcc sucks by resorting to rep stosq
> around 48 bytes
> 4. memsets not sorted out have sizes not known at compilation time and
> are not necessarily perf bugs on their own [read: would benefit from
> faster memset]
>
> Onto the the meat:
>
> I patched the kernel with a slightly tweaked version of the above:
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 45af70315a94..7abb5490690f 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -717,6 +717,12 @@ static inline void *kmem_cache_zalloc(struct
> kmem_cache *k, gfp_t flags)
> */
> static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
> {
> + if (__builtin_constant_p(size)) {
> + void *ret = kmalloc(size, flags);
> + if (likely(ret))
> + memset(ret, 0, size);
> + return ret;
> + }
> return kmalloc(size, flags | __GFP_ZERO);
> }
>
> and verified it indeed zeroes inline:
>
> void kztest(void)
> {
> void *ptr;
>
> ptr = kzalloc(32, GFP_KERNEL);
> if (unlikely(!ptr))
> return;
> memsettest_rec(ptr);
> }
>
> $ objdump --disassemble=kztest vmlinux
> [snip]
> call ffffffff8135e130 <kmalloc_trace>
> test %rax,%rax
> je ffffffff81447d5f <kztest+0x4f>
> movq $0x0,(%rax)
> mov %rax,%rdi
> movq $0x0,0x8(%rax)
> movq $0x0,0x10(%rax)
> movq $0x0,0x18(%rax)
> call ffffffff81454060 <memsettest_rec>
> [snip]
>
> This did *NOT* lead to reduction of memset calls when building the kernel.
>
> I verified few cases by hand, it is all either kmem_cache_zalloc or
> explicitly added memsets with sizes not known at compilation time.
>
> Two most frequent callers:
> @[
> memset+5
> __alloc_file+40
> alloc_empty_file+73
> path_openat+77
> do_filp_open+182
> do_sys_openat2+159
> __x64_sys_openat+89
> do_syscall_64+93
> entry_SYSCALL_64_after_hwframe+114
> ]: 11028994
> @[
> memset+5
> security_file_alloc+45
> __alloc_file+89
> alloc_empty_file+73
> path_openat+77
> do_filp_open+182
> do_sys_openat2+159
> __x64_sys_openat+89
> do_syscall_64+93
> entry_SYSCALL_64_after_hwframe+114
> ]: 11028994
>
> My wip addition is:
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 45af70315a94..12b5b02ef3d3 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -710,6 +710,17 @@ static inline void *kmem_cache_zalloc(struct
> kmem_cache *k, gfp_t flags)
> return kmem_cache_alloc(k, flags | __GFP_ZERO);
> }
>
> +#define kmem_cache_zalloc_ptr(k, f, retp) ({ \
> + __typeof(retp) _retp = kmem_cache_alloc(k, f); \
> + bool _rv = false; \
> + retp = _retp; \
> + if (likely(_retp)) { \
> + memset(_retp, 0, sizeof(*_retp)); \
> + _rv = true; \
> + } \
> + _rv; \
> +})
> +
> diff --git a/security/security.c b/security/security.c
> index cf6cc576736f..0f769ede0e54 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -600,8 +600,7 @@ static int lsm_file_alloc(struct file *file)
> return 0;
> }
>
> - file->f_security = kmem_cache_zalloc(lsm_file_cache, GFP_KERNEL);
> - if (file->f_security == NULL)
> + if (!kmem_cache_zalloc_ptr(lsm_file_cache, GFP_KERNEL,
> file->f_security))
> return -ENOMEM;
> return 0;
> }
This one is actually buggy -- f_security is a void * pointer and
sizeof(*file->f_security) returns just 1. The macro does not have any
safety belts against that -- it should probably check for void * at
compilation time and get a BUG_ON for runtime mismatch. Does not
affect the idea though.
Good news: gcc provides a lot of control as to how it inlines string
ops, most notably:
-mstringop-strategy=alg
Override the internal decision heuristic for the particular
algorithm to use for inlining string
operations. The allowed values for alg are:
rep_byte
rep_4byte
rep_8byte
Expand using i386 "rep" prefix of the specified size.
byte_loop
loop
unrolled_loop
Expand into an inline loop.
libcall
Always use a library call.
I'm going to play with it and send something more presentable.
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 372653b92617..8e0dabf9530e 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -136,8 +136,7 @@ static struct file *__alloc_file(int flags, const
> struct cred *cred)
> struct file *f;
> int error;
>
> - f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
> - if (unlikely(!f))
> + if (!kmem_cache_zalloc_ptr(filp_cachep, GFP_KERNEL, f))
> return ERR_PTR(-ENOMEM);
>
> f->f_cred = get_cred(cred);
>
> As mentioned above it cuts total calls in more than half.
>
> The problem is it is it rolls with rep stosq way too easily, partially
> defeating the point of inlining anything. clang does not have this
> problem.
>
> Take a look at __alloc_file:
> [snip]
> mov 0x19cab05(%rip),%rdi # ffffffff82df4318 <filp_cachep>
> call ffffffff813dd610 <kmem_cache_alloc>
> test %rax,%rax
> je ffffffff814298b7 <__alloc_file+0xc7>
> mov %rax,%r12
> mov $0x1d,%ecx
> xor %eax,%eax
> mov %r12,%rdi
> rep stos %rax,%es:(%rdi)
> [/snip]
>
> Here is a sample consumer which can't help but have a variable size --
> select, used by gmake:
> @[
> memset+5
> do_pselect.constprop.0+202
> __x64_sys_pselect6+101
> do_syscall_64+93
> entry_SYSCALL_64_after_hwframe+114
> ]: 13160
>
> In conclusion:
> 1. fixing up memsets is worthwhile regardless of what happens to its
> current consumers -- not all of them are necessarily doing something
> wrong
> 2. inlining memset can be a pessimization vs non-plain-erms memset as
> evidenced above. will have to figure out how to convince gcc to be
> less eager to use it.
>
> Sometimes I hate computers.
>
> --
> Mateusz Guzik <mjguzik gmail.com>
>
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists