linux-kernel - Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z74KHyGGMzkhx5f-@pc636>
Date: Tue, 25 Feb 2025 19:21:19 +0100
From: Uladzislau Rezki <urezki@...il.com>
To: Keith Busch <keith.busch@...il.com>, Vlastimil Babka <vbabka@...e.cz>
Cc: Keith Busch <keith.busch@...il.com>, Vlastimil Babka <vbabka@...e.cz>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Joel Fernandes <joel@...lfernandes.org>,
	Josh Triplett <josh@...htriplett.org>,
	Boqun Feng <boqun.feng@...il.com>, Christoph Lameter <cl@...ux.com>,
	David Rientjes <rientjes@...gle.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>,
	Julia Lawall <Julia.Lawall@...ia.fr>,
	Jakub Kicinski <kuba@...nel.org>,
	"Jason A. Donenfeld" <Jason@...c4.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
	Alexander Potapenko <glider@...gle.com>,
	Marco Elver <elver@...gle.com>, Dmitry Vyukov <dvyukov@...gle.com>,
	kasan-dev@...glegroups.com, Jann Horn <jannh@...gle.com>,
	Mateusz Guzik <mjguzik@...il.com>, linux-nvme@...ts.infradead.org,
	leitao@...ian.org
Subject: Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from
 kmem_cache_destroy()

On Tue, Feb 25, 2025 at 06:41:19PM +0100, Uladzislau Rezki wrote:
> On Tue, Feb 25, 2025 at 10:05:37AM -0700, Keith Busch wrote:
> > On Tue, Feb 25, 2025 at 09:03:38AM -0700, Keith Busch wrote:
> > > On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> > > > I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> > > > it's too simple, or racy, and thus we are not flushing any of the queues from
> > > > kvfree_rcu_barrier()?
> > >
> > > Thanks, your test readily triggers it for me, but only if I load
> > > rcutorture at the same time.
> > 
> > Oops, I sent the wrong kernel messages. This is the relevant part:
> > 
> > [  142.371052] workqueue: WQ_MEM_RECLAIM
> > test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is
> > flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> > [  142.371072] WARNING: CPU: 11 PID: 186 at kernel/workqueue.c:3715
> > check_flush_dependency.part.0+0xad/0x100
> > [  142.375748] Modules linked in: slub_kunit(E) rcutorture(E)
> > torture(E) kunit(E) iTCO_wdt(E) iTCO_vendor_support(E)
> > intel_uncore_frequency_common(E) skx_edac_common(E) nfit(E)
> > libnvdimm(E) kvm_intel(E) kvm(E) evdev(E) bochs(E) serio_raw(E)
> > drm_kms_helper(E) i2c_i801(E) e1000e(E) i2c_smbus(E) intel_agp(E)
> > intel_gtt(E) lpc_ich(E) agpgart(E) mfd_core(E) drm_shm]
> > [  142.384553] CPU: 11 UID: 0 PID: 186 Comm: kworker/u64:11 Tainted: G
> >            E    N 6.13.0-04839-g5e7b40f0ddce-dirty #831
> > [  142.386755] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> > [  142.387849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [  142.390236] Workqueue: test_kfree_rcu_destroy_wq
> > cache_destroy_workfn [slub_kunit]
> > [  142.391863] RIP: 0010:check_flush_dependency.part.0+0xad/0x100
> > [  142.393183] Code: 75 dc 48 8b 55 18 49 8d 8d 78 01 00 00 4d 89 f0
> > 48 81 c6 78 01 00 00 48 c7 c7 00 e1 9a 82 c6 05 4f 39 c5 02 01 e8 53
> > bd fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 80 3d 39 39 c5 02 00 75 83
> > 41 8b
> > [  142.396981] RSP: 0018:ffffc900007cfc90 EFLAGS: 00010092
> > [  142.398124] RAX: 000000000000008f RBX: ffff88803e9b10a0 RCX: 0000000000000027
> > [  142.399605] RDX: ffff88803eba0d08 RSI: 0000000000000001 RDI: ffff88803eba0d00
> > [  142.401092] RBP: ffff888007d9a480 R08: ffffffff83b8c808 R09: 0000000000000003
> > [  142.402548] R10: ffffffff8348c820 R11: ffffffff83a11d58 R12: ffff888007150000
> > [  142.404098] R13: ffff888005961400 R14: ffffffff813221a0 R15: ffff888005961400
> > [  142.405561] FS:  0000000000000000(0000) GS:ffff88803eb80000(0000)
> > knlGS:0000000000000000
> > [  142.407297] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  142.408658] CR2: 00007f826bd1a000 CR3: 00000000069db002 CR4: 0000000000772ef0
> > [  142.410259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  142.411871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  142.413341] PKRU: 55555554
> > [  142.414038] Call Trace:
> > [  142.414658]  <TASK>
> > [  142.415249]  ? __warn+0x8d/0x180
> > [  142.416035]  ? check_flush_dependency.part.0+0xad/0x100
> > [  142.417182]  ? report_bug+0x160/0x170
> > [  142.418041]  ? handle_bug+0x4f/0x90
> > [  142.418861]  ? exc_invalid_op+0x14/0x70
> > [  142.419853]  ? asm_exc_invalid_op+0x16/0x20
> > [  142.420877]  ? kfree_rcu_shrink_scan+0x120/0x120
> > [  142.422029]  ? check_flush_dependency.part.0+0xad/0x100
> > [  142.423244]  __flush_work+0x38a/0x4a0
> > [  142.424157]  ? find_held_lock+0x2b/0x80
> > [  142.425070]  ? flush_rcu_work+0x26/0x40
> > [  142.425953]  ? lock_release+0xb3/0x250
> > [  142.426785]  ? __mutex_unlock_slowpath+0x2c/0x270
> > [  142.427906]  flush_rcu_work+0x30/0x40
> > [  142.428756]  kvfree_rcu_barrier+0xe9/0x130
> > [  142.429649]  kmem_cache_destroy+0x2b/0x1f0
> > [  142.430578]  cache_destroy_workfn+0x20/0x40 [slub_kunit]
> > [  142.431729]  process_one_work+0x1cd/0x560
> > [  142.432620]  worker_thread+0x183/0x310
> > [  142.433487]  ? rescuer_thread+0x330/0x330
> > [  142.434428]  kthread+0xd8/0x1d0
> > [  142.435248]  ? ret_from_fork+0x17/0x50
> > [  142.436165]  ? lock_release+0xb3/0x250
> > [  142.437106]  ? kthreads_online_cpu+0xf0/0xf0
> > [  142.438133]  ret_from_fork+0x2d/0x50
> > [  142.439045]  ? kthreads_online_cpu+0xf0/0xf0
> > [  142.440428]  ret_from_fork_asm+0x11/0x20
> > [  142.441476]  </TASK>
> > [  142.442152] irq event stamp: 22858
> > [  142.443002] hardirqs last  enabled at (22857): [<ffffffff82044ef4>]
> > _raw_spin_unlock_irq+0x24/0x30
> > [  142.445032] hardirqs last disabled at (22858): [<ffffffff82044ce3>]
> > _raw_spin_lock_irq+0x43/0x50
> > [  142.451450] softirqs last  enabled at (22714): [<ffffffff810bfdbc>]
> > __irq_exit_rcu+0xac/0xd0
> > [  142.453345] softirqs last disabled at (22709): [<ffffffff810bfdbc>]
> > __irq_exit_rcu+0xac/0xd0
> > [  142.455305] ---[ end trace 0000000000000000 ]---
> Thanks!
> 
> I can trigger this also:
> 
> <snip>
> [   21.712856] KTAP version 1
> [   21.712862] 1..1
> [   21.714486]     KTAP version 1
> [   21.714490]     # Subtest: slub_test
> [   21.714492]     # module: slub_kunit
> [   21.714495]     1..10
> [   21.750359]     ok 1 test_clobber_zone
> [   21.750955]     ok 2 test_next_pointer
> [   21.751532]     ok 3 test_first_word
> [   21.751991]     ok 4 test_clobber_50th_byte
> [   21.752493]     ok 5 test_clobber_redzone_free
> [   21.753004] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> [   21.756176]     ok 6 test_kmalloc_redzone_access
> [   21.806549]     ok 7 test_kfree_rcu
> [   22.058010] ------------[ cut here ]------------
> [   22.058015] workqueue: WQ_MEM_RECLAIM test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> [   22.058039] WARNING: CPU: 19 PID: 474 at kernel/workqueue.c:3715 check_flush_dependency.part.0+0xbe/0x130
> [   22.058047] Modules linked in: slub_kunit(E) kunit(E) binfmt_misc(E) bochs(E) drm_client_lib(E) drm_shmem_helper(E) ppdev(E) drm_kms_helper(E) snd_pcm(E) sg(E) snd_timer(E) evdev(E) snd(E) joydev(E) parport_pc(E) parport(E) soundcore(E) serio_raw(E) button(E) pcspkr(E) drm(E) fuse(E) dm_mod(E) efi_pstore(E) configfs(E) loop(E) qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E) ata_generic(E) ata_piix(E) libata(E) scsi_mod(E) i2c_piix4(E) psmouse(E) e1000(E) i2c_smbus(E) scsi_common(E) floppy(E)
> [   22.058091] CPU: 19 UID: 0 PID: 474 Comm: kworker/u257:0 Kdump: loaded Tainted: G            E    N 6.14.0-rc1+ #286
> [   22.058096] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> [   22.058097] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   22.058099] Workqueue: test_kfree_rcu_destroy_wq cache_destroy_workfn [slub_kunit]
> [   22.058103] RIP: 0010:check_flush_dependency.part.0+0xbe/0x130
> [   22.058106] Code: 75 d0 48 8b 55 18 49 8d 8d c0 00 00 00 4d 89 f0 48 81 c6 c0 00 00 00 48 c7 c7 b0 7d c8 bd c6 05 6c 78 53 01 01 e8 a2 ae fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc f6 c4 08 74 94 31 ed
> [   22.058108] RSP: 0018:ffff95e5c123fd50 EFLAGS: 00010086
> [   22.058111] RAX: 0000000000000000 RBX: ffff89a4ff22d5a0 RCX: 0000000000000000
> [   22.058113] RDX: 0000000000000003 RSI: ffffffffbdce1697 RDI: 00000000ffffffff
> [   22.058114] RBP: ffff89961043a780 R08: 0000000000000000 R09: 0000000000000003
> [   22.058116] R10: ffff95e5c123fbe8 R11: ffff89a53fefefa8 R12: ffff89960cb6b080
> [   22.058117] R13: ffff899600051400 R14: ffffffffbcf2ba80 R15: ffff89960005a800
> [   22.058120] FS:  0000000000000000(0000) GS:ffff89a4ff2c0000(0000) knlGS:0000000000000000
> [   22.058122] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   22.058124] CR2: 000055bf2cbc6038 CR3: 000000010dc1e000 CR4: 00000000000006f0
> [   22.058128] Call Trace:
> [   22.058130]  <TASK>
> [   22.058133]  ? __warn+0x85/0x130
> [   22.058137]  ? check_flush_dependency.part.0+0xbe/0x130
> [   22.058139]  ? report_bug+0x18d/0x1c0
> [   22.058142]  ? prb_read_valid+0x17/0x20
> [   22.058147]  ? handle_bug+0x58/0x90
> [   22.058151]  ? exc_invalid_op+0x13/0x60
> [   22.058154]  ? asm_exc_invalid_op+0x16/0x20
> [   22.058158]  ? __pfx_kfree_rcu_work+0x10/0x10
> [   22.058162]  ? check_flush_dependency.part.0+0xbe/0x130
> [   22.058165]  __flush_work+0xd6/0x320
> [   22.058168]  flush_rcu_work+0x39/0x50
> [   22.058171]  kvfree_rcu_barrier+0xe9/0x130
> [   22.058174]  kmem_cache_destroy+0x18/0x140
> [   22.058177]  process_one_work+0x184/0x3a0
> [   22.058180]  worker_thread+0x24d/0x360
> [   22.058183]  ? __pfx_worker_thread+0x10/0x10
> [   22.058185]  kthread+0xfc/0x230
> [   22.058189]  ? finish_task_switch.isra.0+0x85/0x2a0
> [   22.058192]  ? __pfx_kthread+0x10/0x10
> [   22.058195]  ret_from_fork+0x30/0x50
> [   22.058199]  ? __pfx_kthread+0x10/0x10
> [   22.058202]  ret_from_fork_asm+0x1a/0x30
> [   22.058206]  </TASK>
> [   22.058207] ---[ end trace 0000000000000000 ]---
> [   23.123507]     ok 8 test_kfree_rcu_wq_destroy
> [   23.151033]     ok 9 test_leak_destroy
> [   23.151612]     ok 10 test_krealloc_redzone_zeroing
> [   23.151617] # slub_test: pass:10 fail:0 skip:0 total:10
> [   23.151619] # Totals: pass:10 fail:0 skip:0 total:10
> [   23.151620] ok 1 slub_test
> urezki@...38:~$
> <snip>
> 
> but i had to adapt slightly the Vlastimil's test:
> 
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index f11691315c2f..222f6d204b0d 100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -6,6 +6,7 @@
>  #include <linux/module.h>
>  #include <linux/kernel.h>
>  #include <linux/rcupdate.h>
> +#include <linux/delay.h>
>  #include "../mm/slab.h"
> 
>  static struct kunit_resource resource;
> @@ -181,6 +182,63 @@ static void test_kfree_rcu(struct kunit *test)
>         KUNIT_EXPECT_EQ(test, 0, slab_errors);
>  }
> 
> +struct cache_destroy_work {
> +        struct work_struct work;
> +        struct kmem_cache *s;
> +};
> +
> +static void cache_destroy_workfn(struct work_struct *w)
> +{
> +       struct cache_destroy_work *cdw;
> +
> +       cdw = container_of(w, struct cache_destroy_work, work);
> +       kmem_cache_destroy(cdw->s);
> +}
> +
> +#define KMEM_CACHE_DESTROY_NR 10
> +
> +static void test_kfree_rcu_wq_destroy(struct kunit *test)
> +{
> +       struct test_kfree_rcu_struct *p;
> +       struct cache_destroy_work cdw;
> +       struct workqueue_struct *wq;
> +       struct kmem_cache *s;
> +       unsigned int rnd;
> +       int i;
> +
> +       if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> +               kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> +       INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
> +       wq = alloc_workqueue("test_kfree_rcu_destroy_wq",
> +                       WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> +
> +       if (!wq)
> +               kunit_skip(test, "failed to alloc wq");
> +
> +       for (i = 0; i < KMEM_CACHE_DESTROY_NR; i++) {
> +               s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
> +                               sizeof(struct test_kfree_rcu_struct),
> +                               SLAB_NO_MERGE);
> +
> +               if (!s)
> +                       kunit_skip(test, "failed to create cache");
> +
> +               rnd = get_random_u8() % 255;
> +               p = kmem_cache_alloc(s, GFP_KERNEL);
> +               kfree_rcu(p, rcu);
> +
> +               cdw.s = s;
> +
> +               msleep(rnd);
> +               queue_work(wq, &cdw.work);
> +               flush_work(&cdw.work);
> +       }
> +
> +       destroy_workqueue(wq);
> +       KUNIT_EXPECT_EQ(test, 0, slab_errors);
> +}
> +
>  static void test_leak_destroy(struct kunit *test)
>  {
>         struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
> @@ -254,6 +312,7 @@ static struct kunit_case test_cases[] = {
>         KUNIT_CASE(test_clobber_redzone_free),
>         KUNIT_CASE(test_kmalloc_redzone_access),
>         KUNIT_CASE(test_kfree_rcu),
> +       KUNIT_CASE(test_kfree_rcu_wq_destroy),
>         KUNIT_CASE(test_leak_destroy),
>         KUNIT_CASE(test_krealloc_redzone_zeroing),
>         {}
> 
> --
> Uladzislau Rezki
>
WQ_MEM_RECLAIM-patch fixes this for me:

<snip>
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4030907b6b7d..1b5ed5512782 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
 static int rcu_delay_page_cache_fill_msec = 5000;
 module_param(rcu_delay_page_cache_fill_msec, int, 0444);

+static struct workqueue_struct *rcu_reclaim_wq;
+
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (5 * HZ)
 #define KFREE_N_BATCHES 2
@@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
        if (delayed_work_pending(&krcp->monitor_work)) {
                delay_left = krcp->monitor_work.timer.expires - jiffies;
                if (delay < delay_left)
-                       mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+                       mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
                return;
        }
-       queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+       queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
 }

 static void
@@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
                        // "free channels", the batch can handle. Break
                        // the loop since it is done with this CPU thus
                        // queuing an RCU work is _always_ success here.
-                       queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
+                       queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
                        WARN_ON_ONCE(!queued);
                        break;
                }
@@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
        if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
                        !atomic_xchg(&krcp->work_in_progress, 1)) {
                if (atomic_read(&krcp->backoff_page_cache_fill)) {
-                       queue_delayed_work(system_unbound_wq,
+                       queue_delayed_work(rcu_reclaim_wq,
                                &krcp->page_cache_work,
                                        msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
                } else {
@@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
        int i, j;
        struct shrinker *kfree_rcu_shrinker;

+       rcu_reclaim_wq = alloc_workqueue("rcu_reclaim",
+               WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+       WARN_ON(!rcu_reclaim_wq);
+
        /* Clamp it to [0:100] seconds interval. */
        if (rcu_delay_page_cache_fill_msec < 0 ||
                rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
<snip>

it passes:

<snip>
[   15.972416] KTAP version 1
[   15.972421] 1..1
[   15.973467]     KTAP version 1
[   15.973470]     # Subtest: slub_test
[   15.973472]     # module: slub_kunit
[   15.973474]     1..10
[   15.974483]     ok 1 test_clobber_zone
[   15.974927]     ok 2 test_next_pointer
[   15.975308]     ok 3 test_first_word
[   15.975672]     ok 4 test_clobber_50th_byte
[   15.976035]     ok 5 test_clobber_redzone_free
[   15.976128] stackdepot: allocating hash table of 1048576 entries via kvcalloc
[   15.979505]     ok 6 test_kmalloc_redzone_access
[   16.014408]     ok 7 test_kfree_rcu
[   17.726602]     ok 8 test_kfree_rcu_wq_destroy
[   17.750323]     ok 9 test_leak_destroy
[   17.750883]     ok 10 test_krealloc_redzone_zeroing
[   17.750887] # slub_test: pass:10 fail:0 skip:0 total:10
[   17.750890] # Totals: pass:10 fail:0 skip:0 total:10
[   17.750891] ok 1 slub_test
<snip>

--
Uladzislau Rezki