[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53b2e1f2-4291-48e5-a668-7cf57d900ecd@suse.cz>
Date: Tue, 6 Aug 2024 13:02:45 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Guenter Roeck <linux@...ck-us.net>
Cc: linux-kernel@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
On 8/6/24 04:40, Linus Torvalds wrote:
> [ Let's drop random people and bring in Vlastimil ]
tglx was reproducing it so I add him back
> Vlastimil,
> it turns out that the "this patch" is entirely a red herring, and the
> problem comes and goes randomly with just some code layout issues. See
>
> http://server.roeck-us.net/qemu/parisc64-6.10.3/
>
> for more detail, particularly you'll see the "log.bad.gz" with the full log.
[ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
[ 0.000000] Slab 0x0000000041ed0000 objects=21 used=5 fp=0x00000000434003d0 flags=0x200(workingset|section=0|zone=0)
flags tell us this came from the partial list (workingset), there's no head flag so order-0
since the error was detected it basically throws the slab page away and tries another one
[ 0.000000] BUG kmem_cache (Tainted: G B ): objects 25 > max 16
[ 0.000000] Slab 0x0000000041ed0080 objects=25 used=6 fp=0x0000000043402790 flags=0x240(workingset|head|section=0|zone=0)
this was also from the partial list but head flag so at least order-1, two things are weird:
- max=16 is same as above even though it should be at least double as
slab page's order is larger
- objects=25 also isn't at least twice than objects=21
All the following are:
[ 0.000000] BUG kmem_cache (Tainted: G B ): objects 25 > max 16
[ 0.000000] Slab 0x0000000041ed0300 objects=25 used=1 fp=0x000000004340c150 flags=0x40(head|section=0|zone=0)
we depleted the partial list so it's allocating new slab pages, that are
also at least order-1
It looks like maxobj calculation is bogus, would be useful to see what values it
calculates from. I'm attaching a diff, but maybe it will also hide the issue...
If someone has a /proc/slabinfo from a working boot with otherwise same config
it might be also enough to guess what values should be expected there,
at least the s-size.
objects=21 vs 25 also seem odd though
used=5 with used=6 in the first two also suggests we already passed this code
successfully for creating a number of kmalloc caches and only then it started
failing, that's also weird.
> See also
>
> https://lore.kernel.org/all/87y15a4p4h.ffs@tglx/
>
> for this thread.
>
> I don't think this is really a slub issue, since it only happens on
> parisc, but maybe you can see what would make parisc different, and
> what could possibly make it all timing- or layout-dependent.
>
> Linus
>
> On Sun, 4 Aug 2024 at 11:36, Guenter Roeck <linux@...ck-us.net> wrote:
>>
>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>>
>> [ 0.000000] =============================================================================
>> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> [ 0.000000] -----------------------------------------------------------------------------
>>
>> This never stops until the emulation aborts.
diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..ec4ed5215f2f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1386,8 +1386,8 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
maxobj = order_objects(slab_order(slab), s->size);
if (slab->objects > maxobj) {
- slab_err(s, slab, "objects %u > max %u",
- slab->objects, maxobj);
+ slab_err(s, slab, "objects %u > max %u (order %d size %u)",
+ slab->objects, maxobj, slab_order(slab), s->size);
return 0;
}
if (slab->inuse > slab->objects) {
Powered by blists - more mailing lists