lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5aceccdf-d268-7872-abb5-c14e9aa8b7b7@redhat.com>
Date:   Tue, 28 Mar 2023 23:02:49 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     Kees Cook <keescook@...omium.org>, linux-modules@...r.kernel.org,
        linux-kernel@...r.kernel.org, pmladek@...e.com,
        petr.pavlu@...e.com, prarit@...hat.com,
        christophe.leroy@...roup.eu, song@...nel.org,
        torvalds@...ux-foundation.org, dave@...olabs.net,
        fan.ni@...sung.com, vincent.fu@...sung.com,
        a.manzanares@...sung.com, colin.i.king@...il.com
Subject: Re: [RFC 00/12] module: avoid userspace pressure on unwanted
 allocations

On 28.03.23 08:16, Luis Chamberlain wrote:
> On Tue, Mar 28, 2023 at 05:44:40AM +0200, David Hildenbrand wrote:
>> ... do you have an updated patch/branch that includes the feedback from
>> Linus so I can give it a churn tomorrow?
> 
> Yeah sure:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230327-module-alloc-opts
> 

I gave that one a go and get for system bootup:

#1:

13.761s tuned.service
12.261s chrony-wait.service
  7.386s NetworkManager-wait-online.service
  5.227s systemd-udev-settle.service
  2.893s initrd-switch-root.service
  2.148s polkit.service
  2.137s smartd.service
  1.893s dracut-initqueue.service
  1.290s NetworkManager.service
  1.032s cups.service


#2

13.881s tuned.service
  9.255s chrony-wait.service
  7.404s NetworkManager-wait-online.service
  5.826s systemd-udev-settle.service
  2.859s initrd-switch-root.service
  2.847s smartd.service
  2.172s polkit.service
  1.884s dracut-initqueue.service
  1.371s NetworkManager.service
  1.119s ModemManager.service


So we're a bit faster (0.2 -- 0.7s) than the original version without 
the rcu patch (~6s).


> The commit log needs updateing to reflect the results I just collected:
> 
> With the alloc patch ("module: avoid allocation if module is already
> present and ready") I see 145 MiB in memory difference in comparison
> to its last patch, "module: extract patient module check into helper".
> So I think that's a clear keeper and should help large CPU count boots.
> 
> The patch "module: add concurrency limiter" which puts the concurency
> delimiter on the kread only saves about 2 MiB with 100 stress-ng ops,
> which seems to be what I needed to reproduce your 400 CPU count original
> issue.
> 
> The program used to reproduce is stress-ng with the new module option:
> 
> echo 0 > /proc/sys/vm/oom_dump_tasks
> ./stress-ng --module 100 --module-name xfs

Above command fills for me with nfs (but also ext4) the kernel log with:

...
[  883.036035] nfs: Unknown symbol xdr_reserve_space (err -2)
[  883.042221] nfs: Unknown symbol rpc_init_wait_queue (err -2)
[  883.048549] nfs: Unknown symbol put_rpccred (err -2)
[  883.054104] nfs: Unknown symbol __fscache_invalidate (err -2)
[  883.060540] nfs: Unknown symbol __fscache_use_cookie (err -2)
[  883.066969] nfs: Unknown symbol rpc_clnt_xprt_switch_has_addr (err -2)
[  883.074264] nfs: Unknown symbol __fscache_begin_write_operation (err -2)
[  883.081743] nfs: Unknown symbol nlmclnt_init (err -2)
[  883.087396] nfs: Unknown symbol nlmclnt_done (err -2)
[  883.093074] nfs: Unknown symbol nfs_debug (err -2)
[  883.098429] nfs: Unknown symbol rpc_wait_for_completion_task (err -2)
[  883.105640] nfs: Unknown symbol __fscache_acquire_cookie (err -2)
[  883.163764] nfs: Unknown symbol rpc_put_task (err -2)
[  883.169461] nfs: Unknown symbol __fscache_acquire_volume (err -2)
[  883.176297] nfs: Unknown symbol rpc_proc_register (err -2)
[  883.182430] nfs: Unknown symbol rpc_shutdown_client (err -2)
[  883.188765] nfs: Unknown symbol rpc_clnt_show_stats (err -2)
[  883.195097] nfs: Unknown symbol __fscache_begin_read_operation (err -2)
...


I do *not* get these errors on manual morprobe/rmmod. BUG in concurrent 
handling or just side-effect of the concurrent loading?

> 
> To see how much max memory I use, I just use:
> 
> free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > foo.log
> 
> Run the test in another window, CTRL-C the test when above
> finishes after 40 seconds and then:
> 
> sort -n -r foo.log  | head -1

[root@...ovo-sr950-01 fs]# sort -n -r foo.log  | head -1
14254024
[root@...ovo-sr950-01 fs]# sort -n -r foo.log  | tail -1
12862528

So 1391496 (KiB I assume, so 1.3 GiB !?) difference compared to before 
the test (I first start capturing and then run stress-ng).


> 
> If you have xfs loaded already you probably wanna pick module just as big
> that you don't have loaded. You must have dependencies loaded already as
> it doesn't call modprobe, it just finit_module's the module.


My setup already has xfs in use. nfs and ext4 are a bit smaller, but 
still big.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ