[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e4515d96-f25b-4cf5-8dd8-f75c21b51bdc@roeck-us.net>
Date: Wed, 15 Oct 2025 12:40:13 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Feng Chen <feng.chen@...ogic.com>, Matthew Wilcox <willy@...radead.org>,
Jeff Layton <jlayton@...nel.org>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Tao Ren <rentao.bupt@...il.com>, Lukas Bulwahn <lukas.bulwahn@...hat.com>,
Alexei Starovoitov <ast@...nel.org>, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: Linux 6.18-rc1
On 10/15/25 11:33, Liam R. Howlett wrote:
> * Guenter Roeck <linux@...ck-us.net> [251015 13:48]:
>> On 10/15/25 10:28, Liam R. Howlett wrote:
>>> + Cc Vlastimil, as you are indicating the slab merge.
>>>
>>>
>>> * Guenter Roeck <linux@...ck-us.net> [251015 06:02]:
>>>> On Mon, Oct 13, 2025 at 09:46:44PM -0700, Guenter Roeck wrote:
>>>>> On Mon, Oct 13, 2025 at 10:08:26AM -0700, Guenter Roeck wrote:
>>>>>> On Sun, Oct 12, 2025 at 02:04:32PM -0700, Linus Torvalds wrote:
>>>>>>> Two weeks have passed, and 6.18-rc1 has been tagged and pushed out.
>>>>>>>
>>>>>>> Things look fairly normal: size-wise this is pretty much right in the
>>>>>>> middle of the pack, and nothing particular stands out in the shortlog
>>>>>>> of merges this merge window appended below. About half the diff is
>>>>>>> drivers, with the res being all over: vfs and filesystems, arch
>>>>>>> updates (although much of that is actually devicetree stuff, so it's
>>>>>>> arguably more driver-related), tooling, rust support etc etc.
>>>>>>>
>>>>>>> This was one of the good merge windows where I didn't end up having to
>>>>>>> bisect any particular problem on nay of the machines I was testing.
>>>>>>> Let's hope that success mostly translates to the bigger picture too.
>>>>>>>
>>>>>>
>>>>>> Test results don't look that good, unfortunately.:
>>>>>>
>>>>> ...
>>>>>> Qemu test results:
>>>>>> total: 609 pass: 581 fail: 28
>>>>>> Failed tests:
>>>> ...
>>>>>> sheb:rts7751r2dplus_defconfig:initrd
>>>>>> sheb:rts7751r2dplus_defconfig:ata:ext2
>>>>>> sheb:rts7751r2dplus_defconfig:usb:ext2
>>>>>> Unit test results:
>>>>>> pass: 655208 fail: 0
>>>>>>
>>>>>
>>>>
>>>> Update on the sheb (SH4 big endian) failures below.
>>>
>>> What is the qemu line you use and the memory configuration of that qemu,
>>> or is this real hardware?
>>>
>> qemu. I tried 6.2.0, 10.0.5, and 10.1.1. Sample command line:
>>
>> qemu-system-sh4eb -M r2d -kernel arch/sh/boot/zImage \
>> -append "console=ttySC1,115200 noiotrap" \
>> -serial null -serial stdio -monitor null -nographic -no-reboot
>>
>> initrd or root file system doesn't really matter because qemu exits
>> almost immediately.
>>
>>> Are there sh4 configs that pass?
>>>
>>
>> little endian - all
>> big endian - none
>
> Do other big endian targets work?
>
The ones I am testing, yes.
>>
>>> It's a bit odd it says "fail: 0" here, Is this message about something
>>> else?
>>
>> This are unit (KUNIT) test results. All 655208 executed unit tests passed.
>> Unit tests not executed because the image crashed or because qemu died are not
>> counted as failed.
>
> Thanks.
>
> ...
>
>>
>> I checked out a test branch at 24d9e8b3c9c, rebased it on top of
>> 24d9e8b3c9c8a6~1 (07fdad3a93756b8), and ran another bisect. Results:
>>
>> # bad: [c5e19dc4c1db098456ee6a924e276a26e692f26c] slab: Introduce kmalloc_nolock() and kfree_nolock().
>> # good: [07fdad3a93756b872da7b53647715c48d0f4a2d0] Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
>> git bisect start 'HEAD' '07fdad3a93756b872da7b53647715c48d0f4a2d0'
>> # good: [10f17a5a3befa328bd9a78ca6b799dd1933f108b] maple_tree: remove redundant __GFP_NOWARN
>> git bisect good 10f17a5a3befa328bd9a78ca6b799dd1933f108b
>> # good: [f97515baad5efa6e1963abd37188fad42515edc8] maple_tree: Replace mt_free_one() with kfree()
>> git bisect good f97515baad5efa6e1963abd37188fad42515edc8
>> # bad: [4df642aa2128c2c346f9c945bddbae37c59bba82] locking/local_lock: Introduce local_lock_is_locked().
>> git bisect bad 4df642aa2128c2c346f9c945bddbae37c59bba82
>> # good: [a20be9b8014abfe68acc2efd81bfb5d2dd4eaf34] maple_tree: Prefilled sheaf conversion and testing
>> git bisect good a20be9b8014abfe68acc2efd81bfb5d2dd4eaf34
>> # bad: [40696586bc008ad34db8135c35ec4b459691af3c] maple_tree: Convert forking to use the sheaf interface
>> git bisect bad 40696586bc008ad34db8135c35ec4b459691af3c
>> # good: [8387347ae261c5e74e9db3f73b91d47f11f8d6f8] maple_tree: Add single node allocation support to maple state
>> git bisect good 8387347ae261c5e74e9db3f73b91d47f11f8d6f8
>> # first bad commit: [40696586bc008ad34db8135c35ec4b459691af3c] maple_tree: Convert forking to use the sheaf interface
>>
>> Reverting just 40696586bc008 in that branch didn't help. So I reverted "slab: Introduce
>> kmalloc_nolock() and kfree_nolock()" in that branch as well, and the image started
>> passing.
>
> This does not make sense to me. The first bad commit being reverted and
> it does not work means that it's not to do with that patch..?
>
> I'm not saying this patch is fine, but surely it indicates a previous
> problem and potentially (most likely?) an intermittent failure?
>
> Is the failure consistently reproduced?
>
Yes.
>
>> In mainline, 719a42e563bb ("maple_tree: Convert forking to use the sheaf interface")
>> can be reverted, but trying to revert af92793e52c3 results in:
>> CONFLICT (content): Merge conflict in mm/slub.c
>
> Forking shouldn't be running so early that the console output is
> affected, so I'm not sure how this change would cause what you are
> describing.
>
I did some more digging and found that the following reverts on top of mainline
are clean.
d0e0bf7519b7 (HEAD -> master) Revert "maple_tree: Convert forking to use the sheaf interface"
9807b6d44849 Revert "slab: Introduce kmalloc_nolock() and kfree_nolock()."
036271875f62 Revert "slab: Fix using this_cpu_ptr() in preemptible context"
But that doesn't fix the problem. I then switched the gcc version from 14.3 to 13.4.
And everything starts working, even without reverts.
So, you are correct. This is not a code problem. It maybe something like crossing
a page boundary which isn't handled correctly by qemu; we had this before.
Sorry for the noise :-(.
Guenter
Powered by blists - more mailing lists