[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5aceccdf-d268-7872-abb5-c14e9aa8b7b7@redhat.com>
Date: Tue, 28 Mar 2023 23:02:49 +0200
From: David Hildenbrand <david@...hat.com>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: Kees Cook <keescook@...omium.org>, linux-modules@...r.kernel.org,
linux-kernel@...r.kernel.org, pmladek@...e.com,
petr.pavlu@...e.com, prarit@...hat.com,
christophe.leroy@...roup.eu, song@...nel.org,
torvalds@...ux-foundation.org, dave@...olabs.net,
fan.ni@...sung.com, vincent.fu@...sung.com,
a.manzanares@...sung.com, colin.i.king@...il.com
Subject: Re: [RFC 00/12] module: avoid userspace pressure on unwanted
allocations
On 28.03.23 08:16, Luis Chamberlain wrote:
> On Tue, Mar 28, 2023 at 05:44:40AM +0200, David Hildenbrand wrote:
>> ... do you have an updated patch/branch that includes the feedback from
>> Linus so I can give it a churn tomorrow?
>
> Yeah sure:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230327-module-alloc-opts
>
I gave that one a go and get for system bootup:
#1:
13.761s tuned.service
12.261s chrony-wait.service
7.386s NetworkManager-wait-online.service
5.227s systemd-udev-settle.service
2.893s initrd-switch-root.service
2.148s polkit.service
2.137s smartd.service
1.893s dracut-initqueue.service
1.290s NetworkManager.service
1.032s cups.service
#2
13.881s tuned.service
9.255s chrony-wait.service
7.404s NetworkManager-wait-online.service
5.826s systemd-udev-settle.service
2.859s initrd-switch-root.service
2.847s smartd.service
2.172s polkit.service
1.884s dracut-initqueue.service
1.371s NetworkManager.service
1.119s ModemManager.service
So we're a bit faster (0.2 -- 0.7s) than the original version without
the rcu patch (~6s).
> The commit log needs updateing to reflect the results I just collected:
>
> With the alloc patch ("module: avoid allocation if module is already
> present and ready") I see 145 MiB in memory difference in comparison
> to its last patch, "module: extract patient module check into helper".
> So I think that's a clear keeper and should help large CPU count boots.
>
> The patch "module: add concurrency limiter" which puts the concurency
> delimiter on the kread only saves about 2 MiB with 100 stress-ng ops,
> which seems to be what I needed to reproduce your 400 CPU count original
> issue.
>
> The program used to reproduce is stress-ng with the new module option:
>
> echo 0 > /proc/sys/vm/oom_dump_tasks
> ./stress-ng --module 100 --module-name xfs
Above command fills for me with nfs (but also ext4) the kernel log with:
...
[ 883.036035] nfs: Unknown symbol xdr_reserve_space (err -2)
[ 883.042221] nfs: Unknown symbol rpc_init_wait_queue (err -2)
[ 883.048549] nfs: Unknown symbol put_rpccred (err -2)
[ 883.054104] nfs: Unknown symbol __fscache_invalidate (err -2)
[ 883.060540] nfs: Unknown symbol __fscache_use_cookie (err -2)
[ 883.066969] nfs: Unknown symbol rpc_clnt_xprt_switch_has_addr (err -2)
[ 883.074264] nfs: Unknown symbol __fscache_begin_write_operation (err -2)
[ 883.081743] nfs: Unknown symbol nlmclnt_init (err -2)
[ 883.087396] nfs: Unknown symbol nlmclnt_done (err -2)
[ 883.093074] nfs: Unknown symbol nfs_debug (err -2)
[ 883.098429] nfs: Unknown symbol rpc_wait_for_completion_task (err -2)
[ 883.105640] nfs: Unknown symbol __fscache_acquire_cookie (err -2)
[ 883.163764] nfs: Unknown symbol rpc_put_task (err -2)
[ 883.169461] nfs: Unknown symbol __fscache_acquire_volume (err -2)
[ 883.176297] nfs: Unknown symbol rpc_proc_register (err -2)
[ 883.182430] nfs: Unknown symbol rpc_shutdown_client (err -2)
[ 883.188765] nfs: Unknown symbol rpc_clnt_show_stats (err -2)
[ 883.195097] nfs: Unknown symbol __fscache_begin_read_operation (err -2)
...
I do *not* get these errors on manual morprobe/rmmod. BUG in concurrent
handling or just side-effect of the concurrent loading?
>
> To see how much max memory I use, I just use:
>
> free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > foo.log
>
> Run the test in another window, CTRL-C the test when above
> finishes after 40 seconds and then:
>
> sort -n -r foo.log | head -1
[root@...ovo-sr950-01 fs]# sort -n -r foo.log | head -1
14254024
[root@...ovo-sr950-01 fs]# sort -n -r foo.log | tail -1
12862528
So 1391496 (KiB I assume, so 1.3 GiB !?) difference compared to before
the test (I first start capturing and then run stress-ng).
>
> If you have xfs loaded already you probably wanna pick module just as big
> that you don't have loaded. You must have dependencies loaded already as
> it doesn't call modprobe, it just finit_module's the module.
My setup already has xfs in use. nfs and ext4 are a bit smaller, but
still big.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists