lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHR33mZEqtstEVz=x0tgG9ZDbH4Tf0nEmVUnn5GygR+NA@mail.gmail.com>
Date: Tue, 15 Oct 2024 09:22:55 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Florian Westphal <fw@...len.de>, 
	linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>, 
	Kent Overstreet <kent.overstreet@...ux.dev>, Ben Greear <greearb@...delatech.com>
Subject: Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending
 kfree_rcu calls

On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@...il.com> wrote:
>
> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@...il.com> wrote:
> > >
> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@...il.com> wrote:
> > > > >
> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
> > > > > > >
> > > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@...len.de> wrote:
> > > > > > >
> > > > > > > > Ben Greear reports following splat:
> > > > > > > >  ------------[ cut here ]------------
> > > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > > > ...
> > > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > >   codetag_unload_module+0x19b/0x2a0
> > > > > > > >   ? codetag_load_module+0x80/0x80
> > > > > > > >
> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > > > >
> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > > > resolves this warning.
> > > > > > > >
> > > > > > > > Reproducer:
> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > > > rmmod nft_chain_nat
> > > > > > > > rmmod nf_nat                # will WARN.
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > --- a/lib/codetag.c
> > > > > > > > +++ b/lib/codetag.c
> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > > > >       if (!mod)
> > > > > > > >               return true;
> > > > > > > >
> > > > > > > > +     kvfree_rcu_barrier();
> > > > > > > > +
> > > > > > > >       mutex_lock(&codetag_lock);
> > > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > > > > >               struct codetag_module *found = NULL;
> > > > > > >
> > > > > > > It's always hard to determine why a thing like this is present, so a
> > > > > > > comment is helpful:
> > > > > > >
> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > > > +++ a/lib/codetag.c
> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > > > >         if (!mod)
> > > > > > >                 return true;
> > > > > > >
> > > > > > > +       /* await any module's kfree_rcu() operations to complete */
> > > > > > >         kvfree_rcu_barrier();
> > > > > > >
> > > > > > >         mutex_lock(&codetag_lock);
> > > > > > > _
> > > > > > >
> > > > > > > But I do wonder whether this is in the correct place.
> > > > > > >
> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > > > should properly be done by the core module handling code?
> > > > > >
> > > > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > > > before the module is unloaded.
> > > > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > > > which will be accessed when kfree() actually happens, therefore a
> > > > > > strict ordering is important.
> > > > > >
> > > > > > >
> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > > > module memory itself and I think codetag_unload_module() could be
> > > > > > > called after that?
> > > > > >
> > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > > > inside free_module() but according to the reply in
> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > > > >
> > > > > It is because, synchronize_rcu() is used for a bit different things,
> > > > > i.e. it is about a GP completion. Offloading objects can span several
> > > > > GPs.
> > > >
> > > > Ah, thanks! Now that I looked into the patch that recently added
> > > > kvfree_rcu_barrier() I understand that a bit better.
> > > >
> > > > >
> > > > > > Note that once I'm done upstreaming
> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > > > > this change will not be needed and I'm planning to remove this call,
> > > > > > however this change is useful for backporting. It should be sent to
> > > > > > stable@...r.kernel.org # v6.10+
> > > > > >
> > > > > The kvfree_rcu_barrier() has been added into v6.12:
> > > > >
> > > > > <snip>
> > > > > urezki@...38:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > > > next-20240912
> > > > > next-20240919
> > > > > next-20240920
> > > > > next-20241002
> > > > > v6.12-rc1
> > > > > urezki@...38:~/data/raid0/coding/linux.git$
> > > > > <snip>
> > > > >
> > > > > For 6.10+, it implies that the mentioned commit should be backported also.
> > > >
> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > > > kvfree_rcu_barrier()?
> > > >
> > > For kernels < 6.12, unfortunately not :) If you have a possibility to
> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> > > which waits for all callbacks to be completed.
> >
> > We do not control the call-site inside the module, so this would not
> > be possible.
> >
> > >
> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
> >
> > That sounds better. I'll take a stab at it. Thanks!

I prepared backports for stable 6.10.y and 6.11.y branches which
contain this patch along with the prerequisite "rcu/kvfree: Add
kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
required. However, since this patch is not yet in Linus' tree, I'll
wait for it to get there before sending backports to stable (per Greg
KH's recommendation). Please let me know if you think it's more
critical and should be posted earlier.
Thanks,
Suren.

> >
> You are welcome!
>
> --
> Uladzislau Rezki
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ