linux-kernel - Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78933046-09ec-4dc9-9ab1-f00b4cceab27@roeck-us.net>
Date: Tue, 6 Aug 2024 18:38:34 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: James Bottomley <James.Bottomley@...senPartnership.com>,
 Thomas Gleixner <tglx@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>,
 Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
 Helge Deller <deller@....de>, linux-parisc@...r.kernel.org
Subject: Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

On 8/6/24 17:49, James Bottomley wrote:
> On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote:
>> Cc+: Helge, parisc ML
>>
>> We're chasing a weird failure which has been tracked down to the
>> placement of the division library functions (I assume they are
>> imported
>> from libgcc).
>>
>> See the thread starting at:
>>
>>   
>> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
>>
>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>>
>>>> So this change adds 16 bytes to __softirq() which moves the
>>>> division
>>>> functions up by 16 bytes. That's all it takes to make the stupid
>>>> go
>>>> away....
>>>
>>> Heh I was actually wondering if the division is somhow messed up
>>> because
>>> maxobj = order_objects() and order_objects() does a division. Now I
>>> suspect
>>> it even more.
>>
>> check_slab() calls into that muck, but I checked the disassembly of a
>> working and a broken kernel and the only difference there is the
>> displacement offset when the code calculates the call address, but
>> that's as expected a difference of 16 bytes.
>>
>> Now it becomes interesting.
>>
>> I added a unused function after __do_softirq() into the softirq text
>> section and filled it with ASM nonsense so that it occupies exactly
>> one
>> page. That moves $$divoI, which is what check_slab() calls, exactly
>> one
>> page forward:
>>
>>      -0000000041218c70 T $$divoI
>>      +0000000041219c70 T $$divoI
>>
>> Guess what happens? If falls on it's nose again.
>>
>> Now with that ASM gunk I can steer the size conveniently. It works up
>> to:
>>
>>      0000000041219c50 T $$divoI
>>
>> and fails for
>>
>>      0000000041219c60 T $$divoI
>>      0000000041219c70 T $$divoI
>>
>> and works again at
>>
>>      0000000041219c80 T $$divoI
> 
> So just on this, you seem to have proved that only exact multiples of
> 48 work.  In terms of how PA-RISC caching works that's completely nuts
> ... however, there may be something else at work, like stack frame
> alignment.
> 
>>
>> So I added the following:
>>
>> +extern void testme(void);
>> +extern unsigned int testsize;
>> +
>> +unsigned int testsize = 192;
>> +
>> +void __init testme(void)
>> +{
>> +       pr_info("TESTME: %lu\n", PAGE_SIZE / testsize);
>> +}
>>
>> called that _before_ mm_core_init() from init/main.c and adjusted my
>> ASM hack to make $$divoI be at:
>>
>>      0000000041219c70 T $$divoI
>>
>> again and surprisingly the output is:
>>
>>      [    0.000000] softirq: TESTME: 21
> 
> OK, why is that surprising?  4096/192 is 21 due to integer rounding.
> 

Problem is that it sometimes wrongly returns 16. But not always.
The surprise may be that it is not consistently wrong, even if
divU is at the same location.

>> Now I went back to the hppa64 gcc version 12.2.0 again and did the
>> same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70
>> in the page and the same happens.
>>
>> So it's not a compiler dependent problem.
>>
>> But then I added a testme() call to the error path and get:
>>
>> [    0.000000] softirq: TESTME: 21
>> [    0.000000]
>> =====================================================================
>> ========
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> size 192 sorder 0
>>
>> Now what's wrong?
>>
>> Adding more debug:
>>
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> size 192 sorder 0 21
>>
>> where the last '21' is the output of the same call which made maxobj
>> go
>> south:
>>
>>   static int check_slab(struct kmem_cache *s, struct slab *slab)
>>   {
>>          int maxobj;
>> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache
>>   
>>          maxobj = order_objects(slab_order(slab), s->size);
>>          if (slab->objects > maxobj) {
>> -               slab_err(s, slab, "objects %u > max %u",
>> -                       slab->objects, maxobj);
>> +               testme();
>> +               slab_err(s, slab, "objects %u > max %u size %u sorder
>> %u %u",
>> +                        slab->objects, maxobj, s->size,
>> slab_order(slab),
>> +                        order_objects(slab_order(slab), s->size));
>>                  return 0;
>>          }
>>          if (slab->inuse > slab->objects) {
>>
>> I don't know and I don't want to know TBH...
> 
> OK, so you're telling us we have a problem with slab_order on parisc
> ... that's folio_order, so it smells like a parisc bug with
> folio_test_large?  Unfortuntely I'm a bit pissed in an airport lounge
> on my way to the UK, so I've lost access to my pa test rig and can't
> test further for a while.
> 

I think the problem is rather that divU, for some unknown reason, sometimes returns
bad results. The divU implementation in libgcc isn't exactly easy to understand.
There may be some context problem, such as the upper bits of some register not
being cleared under some circumstances. Look at my recent commits into
arch/parisc for examples how similar problems survived for a long time in the
Linux kernel.

I'd strongly suspect a qemu emulation problem if it wasn't for those other
problems. Still, I think it is very likely that the problem lies in qemu,
but we might need a confirmation one way or the other from a test on real
hardware.

Thanks,
Guenter