[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d424e109e6f1a00b8cf22ec1b40d6dedff38ce52.camel@physik.fu-berlin.de>
Date: Mon, 04 Aug 2025 08:58:49 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Anthony Yznaga <anthony.yznaga@...cle.com>, "Matthew Wilcox (Oracle)"
<willy@...radead.org>, linux-arch@...r.kernel.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, "David S. Miller"
<davem@...emloft.net>, sparclinux@...r.kernel.org, Andreas Larsson
<andreas@...sler.com>, Rod Schnell <rods@...radio.com>, Sam James
<sam@...too.org>
Subject: Re: [PATCH v4 25/36] sparc64: Implement the new page table range API
Hi,
On Mon, 2025-08-04 at 07:36 +0200, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-08-04 at 07:12 +0200, John Paul Adrian Glaubitz wrote:
> > On Sun, 2025-08-03 at 12:08 -0700, Anthony Yznaga wrote:
> > > There was a follow-on fix that addressed a bug with this patch:
> > >
> > > f4b4f3ec1a31 sparc64: add missing initialization of folio in tlb_batch_add()
> >
> > Indeed I just tried v6.6 which has this patch and added your sun4u fix and it
> > seems to be stable. I was sure I saw problems even with v6.16 though.
> >
> > Let me run more tests.
>
> I'm seeing another crash on v6.16 on sun4u even with your patch applied:
>
> [ 456.443492] kernel BUG at fs/ext4/inode.c:1174!
> [ 456.503059] \|/ ____ \|/
> [ 456.503059] "@'/ .. \`@"
> [ 456.503059] /_| \__/ |_\
> [ 456.503059] \__U_/
> [ 456.696513] apt-get(1217): Kernel bad sw trap 5 [#1]
> [ 456.761698] CPU: 0 UID: 0 PID: 1217 Comm: apt-get Not tainted 6.16.0+ #24 VOLUNTARY
> [ 456.863502] TSTATE: 0000000011001601 TPC: 0000000010309250 TNPC: 0000000010309254 Y: 00000000 Not tainted
> [ 456.992850] TPC: <ext4_block_write_begin+0x450/0x540 [ext4]>
> [ 457.067500] g0: 0000000000000000 g1: 0000000000000001 g2: 0000000000000000 g3: 0000000000000000
> [ 457.181869] g4: fff00000141d5c80 g5: 0000000000000008 g6: fff000000be24000 g7: 0000000000000001
> [ 457.296245] o0: 00000000103944b0 o1: 0000000000000496 o2: ffffffffffffffbf o3: 0000000000101cca
> [ 457.410618] o4: 0000000000000000 o5: 0000000000000000 sp: fff000000be26fd1 ret_pc: 0000000010309248
> [ 457.529571] RPC: <ext4_block_write_begin+0x448/0x540 [ext4]>
> [ 457.604020] l0: fff000003def26e0 l1: 0000000000113cca l2: fff000003def2578 l3: 0000000000000002
> [ 457.718394] l4: 0000000000000000 l5: 0000000000080000 l6: 0000000000012000 l7: 0000000000000001
> [ 457.832770] i0: 0000000000000000 i1: 000c00000026b500 i2: 0000000000001000 i3: 0000000000082000
> [ 457.947146] i4: 00000000103034a0 i5: 0000000000000000 i6: fff000000be270c1 i7: 000000001030c8dc
> [ 458.061528] I7: <ext4_da_write_begin+0x1bc/0x340 [ext4]>
> [ 458.131389] Call Trace:
> [ 458.163408] [<000000001030c8dc>] ext4_da_write_begin+0x1bc/0x340 [ext4]
> [ 458.250447] [<0000000000674230>] generic_perform_write+0x90/0x240
> [ 458.330606] [<00000000102f50b4>] ext4_buffered_write_iter+0x54/0x120 [ext4]
> [ 458.422214] [<00000000102f5624>] ext4_file_write_iter+0x3e4/0x780 [ext4]
> [ 458.510388] [<0000000000749cc4>] vfs_write+0x2c4/0x3e0
> [ 458.577867] [<0000000000749f4c>] ksys_write+0x4c/0xe0
> [ 458.644203] [<0000000000749ff4>] sys_write+0x14/0x40
> [ 458.709397] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> [ 458.786048] Disabling lock debugging due to kernel taint
> [ 458.855904] Caller[000000001030c8dc]: ext4_da_write_begin+0x1bc/0x340 [ext4]
> [ 458.948653] Caller[0000000000674230]: generic_perform_write+0x90/0x240
> [ 459.034430] Caller[00000000102f50b4]: ext4_buffered_write_iter+0x54/0x120 [ext4]
> [ 459.131761] Caller[00000000102f5624]: ext4_file_write_iter+0x3e4/0x780 [ext4]
> [ 459.225648] Caller[0000000000749cc4]: vfs_write+0x2c4/0x3e0
> [ 459.298846] Caller[0000000000749f4c]: ksys_write+0x4c/0xe0
> [ 459.370900] Caller[0000000000749ff4]: sys_write+0x14/0x40
> [ 459.441810] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> [ 459.524168] Caller[0000000000000000]: 0x0
> [ 459.576772] Instruction DUMP:
> [ 459.576776] 11040e51
> [ 459.615662] 7c04b816
> [ 459.646541] 901220b0
> [ 459.677418] <91d02005>
> [ 459.708302] 9735a000
> [ 459.739181] 95352000
> [ 459.770076] d25fa7cf
> [ 459.800945] 7fffe818
> [ 459.831825] 90100019
> [ 459.862706]
> [ 459.941500] systemd[1]: Failed to open /dev/pts device, ignoring: Inappropriate ioctl for device
> [ 460.063831] systemd[1]: rsyslog.service: Main process exited, code=killed, status=6/ABRT
> [ 460.170962] systemd[1]: rsyslog.service: Failed with result 'signal'.
> [ 460.267153] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
> [ 460.388605] systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
> [ 460.517346] systemd[1]: Starting rsyslog.service - System Logging Service...
> [ 460.618299] systemd[1]: Starting systemd-journald.service - Journal Service...
> [ 460.895645] systemd-journald[1237]: Collecting audit messages is disabled.
> [ 461.048068] systemd[1]: Failed to open /dev/pts device, ignoring: Inappropriate ioctl for device
> [ 461.202783] systemd-journald[1237]: File /var/log/journal/9ac90e257b3e423284cfc21a00cbeeb8/system.journal corrupted or uncleanly shut down, renaming and replacing.
> [ 461.456867] systemd[1]: Started rsyslog.service - System Logging Service.
> [ 461.616651] systemd-journald[1237]: Time jumped backwards, rotating.
> [ 461.773305] systemd-journald[1237]: Failed to read journal file /var/log/journal/9ac90e257b3e423284cfc21a00cbeeb8/user-1002.journal for rotation, trying to move it out of the way: Device or
> resource busy
> [ 462.065725] systemd[1]: Started systemd-journald.service - Journal Service.
> [ 462.159895] systemd-journald[1237]: Time jumped backwards, rotating.
> [ 519.719624] kernel BUG at fs/ext4/inode.c:1174!
> [ 519.779143] \|/ ____ \|/
> [ 519.779143] "@'/ .. \`@"
> [ 519.779143] /_| \__/ |_\
> [ 519.779143] \__U_/
> [ 519.972586] apt(1249): Kernel bad sw trap 5 [#2]
> [ 520.033239] CPU: 0 UID: 0 PID: 1249 Comm: apt Tainted: G D 6.16.0+ #24 VOLUNTARY
> [ 520.151048] Tainted: [D]=DIE
> [ 520.188797] TSTATE: 0000000011001603 TPC: 0000000010309250 TNPC: 0000000010309254 Y: 00000000 Tainted: G D
> [ 520.338725] TPC: <ext4_block_write_begin+0x450/0x540 [ext4]>
> [ 520.413282] g0: 0000000000000000 g1: 0000000000000001 g2: 0000000000000000 g3: 0000000000000000
> [ 520.527655] g4: fff00000141d40c0 g5: 000000000000000b g6: fff000000a818000 g7: 0000000000000001
> [ 520.642031] o0: 00000000103944b0 o1: 0000000000000496 o2: fffffffffffffcc0 o3: 0000000000101cca
> [ 520.756408] o4: 0000000000000004 o5: 0000000000000000 sp: fff000000a81afd1 ret_pc: 0000000010309248
> [ 520.875350] RPC: <ext4_block_write_begin+0x448/0x540 [ext4]>
> [ 520.949799] l0: fff000023439af00 l1: 0000000000113cca l2: fff000023439ad98 l3: 0000000000000002
> [ 521.064174] l4: 0000000000000000 l5: 0000000000080000 l6: 0000000000012000 l7: 0000000000000001
> [ 521.178547] i0: 0000000000000000 i1: 000c000000164a00 i2: 0000000000001fc0 i3: 0000000000680000
> [ 521.292923] i4: 00000000103034a0 i5: 0000000000000000 i6: fff000000a81b0c1 i7: 000000001030c8dc
> [ 521.407297] I7: <ext4_da_write_begin+0x1bc/0x340 [ext4]>
> [ 521.477195] Call Trace:
> [ 521.509295] [<000000001030c8dc>] ext4_da_write_begin+0x1bc/0x340 [ext4]
> [ 521.596330] [<0000000000674230>] generic_perform_write+0x90/0x240
> [ 521.676495] [<00000000102f50b4>] ext4_buffered_write_iter+0x54/0x120 [ext4]
> [ 521.768196] [<00000000102f5624>] ext4_file_write_iter+0x3e4/0x780 [ext4]
> [ 521.856381] [<0000000000749cc4>] vfs_write+0x2c4/0x3e0
> [ 521.923957] [<0000000000749f4c>] ksys_write+0x4c/0xe0
> [ 521.990294] [<0000000000749ff4>] sys_write+0x14/0x40
> [ 522.055486] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> [ 522.132122] Caller[000000001030c8dc]: ext4_da_write_begin+0x1bc/0x340 [ext4]
> [ 522.224873] Caller[0000000000674230]: generic_perform_write+0x90/0x240
> [ 522.310649] Caller[00000000102f50b4]: ext4_buffered_write_iter+0x54/0x120 [ext4]
> [ 522.407974] Caller[00000000102f5624]: ext4_file_write_iter+0x3e4/0x780 [ext4]
> [ 522.501864] Caller[0000000000749cc4]: vfs_write+0x2c4/0x3e0
> [ 522.575062] Caller[0000000000749f4c]: ksys_write+0x4c/0xe0
> [ 522.647118] Caller[0000000000749ff4]: sys_write+0x14/0x40
> [ 522.718031] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> [ 522.800380] Caller[0000000000000000]: 0x0
> [ 522.852991] Instruction DUMP:
> [ 522.852994] 11040e51
> [ 522.891878] 7c04b816
> [ 522.922760] 901220b0
> [ 522.953638] <91d02005>
> [ 522.984521] 9735a000
> [ 523.015401] 95352000
> [ 523.046284] d25fa7cf
> [ 523.077163] 7fffe818
> [ 523.108109] 90100019
> [ 523.139044]
>
> I'll try to bisect this one later this week.
OK, so v6.8 is fine while v6.9 crashes:
[ 39.788224] Unable to handle kernel NULL pointer dereference
[ 39.862657] tsk->{mm,active_mm}->context = 000000000000004b
[ 39.935941] tsk->{mm,active_mm}->pgd = fff000000aa98000
[ 40.004566] \|/ ____ \|/
[ 40.004566] "@'/ .. \`@"
[ 40.004566] /_| \__/ |_\
[ 40.004566] \__U_/
[ 40.197871] (udev-worker)(88): Oops [#1]
[ 40.249329] CPU: 0 PID: 88 Comm: (udev-worker) Tainted: P O 6.9.0+ #28
[ 40.353415] TSTATE: 0000004411001605 TPC: 0000000000df092c TNPC: 0000000000df0930 Y: 00000000 Tainted: P O
[ 40.502105] TPC: <strlen+0x60/0xd4>
[ 40.547844] g0: fff000000a3171a1 g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000000001
[ 40.662224] g4: fff000000aa4dac0 g5: 0000000010000233 g6: fff000000a314000 g7: 0000000000000000
[ 40.776599] o0: 0000000000000010 o1: 0000000000000010 o2: 0000000001010101 o3: 0000000080808080
[ 40.890974] o4: 0000000001010000 o5: 0000000000000000 sp: fff000000a317201 ret_pc: 00000000004d4b08
[ 41.009924] RPC: <module_patient_check_exists.constprop.0+0x48/0x1e0>
[ 41.094557] l0: fff0000100032f40 l1: 0000000000000000 l2: 0000000000000000 l3: 0000000000000000
[ 41.208936] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 0000000000000000
[ 41.323311] i0: 00000001000256d8 i1: 0000000001143000 i2: 0000000001143300 i3: 000000000000000b
[ 41.437686] i4: 0000000000000010 i5: fffffffffffffff8 i6: fff000000a3172e1 i7: 00000000004d63f0
[ 41.552062] I7: <load_module+0x550/0x1f00>
[ 41.605811] Call Trace:
[ 41.637838] [<00000000004d63f0>] load_module+0x550/0x1f00
[ 41.708752] [<00000000004d7fac>] init_module_from_file+0x6c/0xa0
[ 41.787670] [<00000000004d81c8>] sys_finit_module+0x188/0x280
[ 41.863158] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
[ 41.939790] Caller[00000000004d63f0]: load_module+0x550/0x1f00
[ 42.016423] Caller[00000000004d7fac]: init_module_from_file+0x6c/0xa0
[ 42.101059] Caller[00000000004d81c8]: sys_finit_module+0x188/0x280
[ 42.182266] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
[ 42.264614] Caller[fff000010480e2fc]: 0xfff000010480e2fc
[ 42.334384] Instruction DUMP:
[ 42.334387] 96132080
[ 42.373269] 19004040
[ 42.404151] 94132101
[ 42.435030] <da020000>
[ 42.465914] 9823400a
[ 42.496793] 808b000b
[ 42.527674] 024ffffd
[ 42.558556] 90022004
[ 42.589437] 8f336018
[ 42.620318]
So, the regression was introduced with v6.9. Will bisect this later this week.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Powered by blists - more mailing lists