lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <105a267b-e166-4928-baac-69c6d1a8f62f@suse.cz>
Date: Wed, 16 Oct 2024 11:56:47 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Suren Baghdasaryan <surenb@...gle.com>,
 Uladzislau Rezki <urezki@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Florian Westphal
 <fw@...len.de>, linux-kernel@...r.kernel.org,
 Kent Overstreet <kent.overstreet@...ux.dev>,
 Ben Greear <greearb@...delatech.com>
Subject: Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending
 kfree_rcu calls

On 10/15/24 18:22, Suren Baghdasaryan wrote:
> On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@...il.com> wrote:
>>
>> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
>> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@...il.com> wrote:
>> > >
>> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
>> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@...il.com> wrote:
>> > > > >
>> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
>> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>> > > > > > >
>> > > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@...len.de> wrote:
>> > > > > > >
>> > > > > > > > Ben Greear reports following splat:
>> > > > > > > >  ------------[ cut here ]------------
>> > > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
>> > > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
>> > > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
>> > > > > > > > ...
>> > > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
>> > > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
>> > > > > > > >   codetag_unload_module+0x19b/0x2a0
>> > > > > > > >   ? codetag_load_module+0x80/0x80
>> > > > > > > >
>> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
>> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
>> > > > > > > >
>> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
>> > > > > > > > resolves this warning.
>> > > > > > > >
>> > > > > > > > Reproducer:
>> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
>> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
>> > > > > > > > rmmod nft_chain_nat
>> > > > > > > > rmmod nf_nat                # will WARN.
>> > > > > > > >
>> > > > > > > > ...
>> > > > > > > >
>> > > > > > > > --- a/lib/codetag.c
>> > > > > > > > +++ b/lib/codetag.c
>> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
>> > > > > > > >       if (!mod)
>> > > > > > > >               return true;
>> > > > > > > >
>> > > > > > > > +     kvfree_rcu_barrier();
>> > > > > > > > +
>> > > > > > > >       mutex_lock(&codetag_lock);
>> > > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
>> > > > > > > >               struct codetag_module *found = NULL;
>> > > > > > >
>> > > > > > > It's always hard to determine why a thing like this is present, so a
>> > > > > > > comment is helpful:
>> > > > > > >
>> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
>> > > > > > > +++ a/lib/codetag.c
>> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
>> > > > > > >         if (!mod)
>> > > > > > >                 return true;
>> > > > > > >
>> > > > > > > +       /* await any module's kfree_rcu() operations to complete */
>> > > > > > >         kvfree_rcu_barrier();
>> > > > > > >
>> > > > > > >         mutex_lock(&codetag_lock);
>> > > > > > > _
>> > > > > > >
>> > > > > > > But I do wonder whether this is in the correct place.
>> > > > > > >
>> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
>> > > > > > > should properly be done by the core module handling code?
>> > > > > >
>> > > > > > I don't think core module code cares about kfree_rcu()s being complete
>> > > > > > before the module is unloaded.
>> > > > > > Allocation tagging OTOH cares because it is about to destroy tags
>> > > > > > which will be accessed when kfree() actually happens, therefore a
>> > > > > > strict ordering is important.
>> > > > > >
>> > > > > > >
>> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
>> > > > > > > module memory itself and I think codetag_unload_module() could be
>> > > > > > > called after that?
>> > > > > >
>> > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
>> > > > > > inside free_module() but according to the reply in
>> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
>> > > > > > synchronize_rcu() does not help. I'm not quite sure why...
>> > > > > >
>> > > > > It is because, synchronize_rcu() is used for a bit different things,
>> > > > > i.e. it is about a GP completion. Offloading objects can span several
>> > > > > GPs.
>> > > >
>> > > > Ah, thanks! Now that I looked into the patch that recently added
>> > > > kvfree_rcu_barrier() I understand that a bit better.
>> > > >
>> > > > >
>> > > > > > Note that once I'm done upstreaming
>> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
>> > > > > > this change will not be needed and I'm planning to remove this call,
>> > > > > > however this change is useful for backporting. It should be sent to
>> > > > > > stable@...r.kernel.org # v6.10+
>> > > > > >
>> > > > > The kvfree_rcu_barrier() has been added into v6.12:
>> > > > >
>> > > > > <snip>
>> > > > > urezki@...38:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
>> > > > > next-20240912
>> > > > > next-20240919
>> > > > > next-20240920
>> > > > > next-20241002
>> > > > > v6.12-rc1
>> > > > > urezki@...38:~/data/raid0/coding/linux.git$
>> > > > > <snip>
>> > > > >
>> > > > > For 6.10+, it implies that the mentioned commit should be backported also.
>> > > >
>> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
>> > > > kvfree_rcu_barrier()?
>> > > >
>> > > For kernels < 6.12, unfortunately not :) If you have a possibility to
>> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
>> > > which waits for all callbacks to be completed.
>> >
>> > We do not control the call-site inside the module, so this would not
>> > be possible.
>> >
>> > >
>> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
>> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
>> >
>> > That sounds better. I'll take a stab at it. Thanks!
> 
> I prepared backports for stable 6.10.y and 6.11.y branches which

6.10 is already EOL thus won't be necessary.

> contain this patch along with the prerequisite "rcu/kvfree: Add
> kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
> required. However, since this patch is not yet in Linus' tree, I'll
> wait for it to get there before sending backports to stable (per Greg
> KH's recommendation). Please let me know if you think it's more
> critical and should be posted earlier.
> Thanks,
> Suren.
> 
>> >
>> You are welcome!
>>
>> --
>> Uladzislau Rezki
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ