linux-kernel - Re: [PATCH] module: add debugging auto-load duplicate module support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bnhskcp6hy6liwlefyjcxumlnvmkmyvhvatkq7ve3kb2zecyxl@c3jq2apjqlcy>
Date:   Fri, 21 Apr 2023 09:42:39 -0700
From:   Lucas De Marchi <lucas.demarchi@...el.com>
To:     Greg KH <gregkh@...uxfoundation.org>
CC:     Luis Chamberlain <mcgrof@...nel.org>, <david@...hat.com>,
        <patches@...ts.linux.dev>, <linux-modules@...r.kernel.org>,
        <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <pmladek@...e.com>, <petr.pavlu@...e.com>, <prarit@...hat.com>,
        <torvalds@...ux-foundation.org>, <rafael@...nel.org>,
        <christophe.leroy@...roup.eu>, <tglx@...utronix.de>,
        <peterz@...radead.org>, <song@...nel.org>, <rppt@...nel.org>,
        <dave@...olabs.net>, <willy@...radead.org>, <vbabka@...e.cz>,
        <mhocko@...e.com>, <dave.hansen@...ux.intel.com>,
        <colin.i.king@...il.com>, <jim.cromie@...il.com>,
        <catalin.marinas@....com>, <jbaron@...mai.com>,
        <rick.p.edgecombe@...el.com>, <j.granados@...sung.com>
Subject: Re: [PATCH] module: add debugging auto-load duplicate module support

On Fri, Apr 21, 2023 at 05:12:51PM +0200, Greg KH wrote:
>On Thu, Apr 20, 2023 at 02:03:32PM -0700, Luis Chamberlain wrote:
>> On Thu, Apr 20, 2023 at 07:32:10AM +0200, Greg KH wrote:
>> > On Wed, Apr 19, 2023 at 04:32:30PM -0700, Luis Chamberlain wrote:
>> > > > It's not "wasted", as it is returned when the module is determined to be
>> > > > a duplicate.  Otherwise everyone will want this enabled as they think it
>> > > > will actually save memory.
>> > >
>> > > I'll change the language to be clear the issue is memory pressure early
>> > > on boot. I'll also add a bit of language to help at least guide people
>> > > to realize that the real value-add for this, ie, I'll have to mention we
>> > > suspect issue is udev and not module auto-loading and that this however
>> > > may still help find a few cases we can optimize for.
>> >
>> > This isn't udev's "problem", all it is doing is what the kernel asked it
>> > to do.  The kernel said "Here's a new device I found, load a module for
>> > it please!"
>>
>> If you believe that then the issue is still a kernel issue, and the
>> second part to that sentence "load a module for it" was done without
>> consideration of the implications, or without optimizations in mind.
>> Given the implications were perhaps not well understood it is unfair
>> for us to be hard on ourselves on that. But now we know, ideally if we
>> could we *should* only issue a request for a module *once* during boot.
>
>But there is no mapping between devices and modules other than what is
>exported in the module info and that is up to userspace to handle.
>
>> Where does the kernel actually say "load a module"?
>
>The driver core says "hey a new device is now present!"
>
>Userspace takes that message and calls kmod with the device information
>which then determines what module to load by looking at the device
>aliases.
>
>> Isn't that just an implied gesture?
>
>Yes.
>
>> > And it's the kmod code here, not udev itself doing all of this.
>>
>> Yes, IMHO kmod could and *should* be enhanced to share a loading context
>> during boot so to avoid duplicates too and then udev would have to
>> embrace such functionality. That's going to take time to propagate, as
>> you can imagine.
>
>udev is just the transport to kmod here, it's not in the job of
>filtering duplicate messages.

udev nowadays use *lib*kmod. It's udev who has the
context it can operate on.

Also, those module loads will not use the path this patch is changing
call_modprobe is not the path that triggers udev to load modules.
/me confused

What can be done from userspace in the udev path

1) udev to do the ratelimit'ing. Define a time window,
filter out uevents in systemd/src/udev/udev-builtin-kmod.c

2) libkmod to do the ratelimit'ing with a similar approach, but udev
needs to tell libkmod what is the window it wants to use

3) libkmod to act on the context it has from the *kernel*. It used
to be cheap with the call simply blocking early on the syscall in
a mutex... or we didn't have that many calls. So libkmod
simply calls [f]init_module() again regardless of the module's
state being in a "coming" state.  Is this the case here? I haven't
seen this data. This is done to avoid a) the toctou implied and b) to
provide the correct return for that call - libkmod can't know if the
previous call will succeed or fail.

libkmod only skips the call if the module is already in
the live state. It seems systemd-udev also duplicates the check
in src/shared/module-util.c:module_load_and_warn()

Note that libkmod already spares loading the module multiple times from
disk as it uses a memory pool for the modules. It reuses one iff it
comes from the same context (i.e. it's only udev involved and not a
bunch of parallel calls to modprobe).

4) If all the calls are coming from the same context and it is udev...
I'm not sure this is actually the problem - the udev's kmod builtin
handler is single-threaded and will handle one request at a time.
I don't see any data to confirm it's coming from a single source or
multiple sources. Could you get a trace containing [f]init_module and
the trace_module_request(), together with a verbose udev log?

If this is all coming from a synthetic use case with thousands of
modprobe execs, I'm not sure there is much to do on the userspace side.

>
>> > Why not
>> > just rate-limit it in userspace if your system can't handle 10's of
>> > thousands of kmod calls all at once? I think many s390 systems did this
>> > decades ago when they were controlling 10's of thousands of scsi devices
>> > and were hit with "device detection storms" at boot like this.
>>
>> Boot is a special context and in this particular case I agree userspace
>> kmod could/should be extended to avoid duplicate module requests in that

see above

>> context. But likewise the kernel should only have to try to issue a
>> request for a single module once, if it could easily do that.
>
>Are you sure that this is happening at boot in a way that userspace
>didn't just trigger it on its own after init started up?  That happens
>as a "coldboot" walk of the device tree and all uevent are regenerated.
>That is userspace asking for this, so there's nothing that the kernel
>can do.
>
>> This does beg the question, why force userspace to rate limit if we
>> can do better in the kernel? Specially if *we're the ones*, as you say,
>> that are hinting to userspace to shoot back loading modules for us and we
>> know we're just going to drop duplicates?
>
>Maybe error out of duplicate module loading earlier?  I don't know,
>sorry.

I still don't see what's the source of the problem from the data
available. Is the kernel issuing multiple request_module()? Or is the
kernel sending multiple udev event for userspace to map the alias to the
module and load it? The mapping alias -> module currently belongs in
userspace so if you are de-duplicating, it can't be only on the module
name.

>
>> > What specific devices and bus types are the problem here for these systems?
>>
>> My best assessment of the situation is that each CPU in udev ends up triggering
>> a load of duplicate set of modules, not just one, but *a lot*. Not sure
>> what heuristics udev uses to load a set of modules per CPU.
>
>Again, finding which device and bus is causing the problem is going to
>be key here to try to solve the issue.  Are you logging duplicate module

agreed.

If the info I requested above is available on other threads, could you
point me to those?

thanks
Lucas De Marchi

>loads by name as well?
>
>thanks,
>
>greg k-h