[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2689df25d95e7c4fab781be2b3a4ac7ff9b50132.camel@physik.fu-berlin.de>
Date: Mon, 04 Aug 2025 09:48:50 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Anthony Yznaga <anthony.yznaga@...cle.com>, "Matthew Wilcox (Oracle)"
<willy@...radead.org>, linux-arch@...r.kernel.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, "David S. Miller"
<davem@...emloft.net>, sparclinux@...r.kernel.org, Andreas Larsson
<andreas@...sler.com>, Rod Schnell <rods@...radio.com>, Sam James
<sam@...too.org>
Subject: Re: [PATCH v4 25/36] sparc64: Implement the new page table range API
Hi,
On Mon, 2025-08-04 at 08:58 +0200, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-08-04 at 07:36 +0200, John Paul Adrian Glaubitz wrote:
> > On Mon, 2025-08-04 at 07:12 +0200, John Paul Adrian Glaubitz wrote:
> > > On Sun, 2025-08-03 at 12:08 -0700, Anthony Yznaga wrote:
> > > > There was a follow-on fix that addressed a bug with this patch:
> > > >
> > > > f4b4f3ec1a31 sparc64: add missing initialization of folio in tlb_batch_add()
> > >
> > > Indeed I just tried v6.6 which has this patch and added your sun4u fix and it
> > > seems to be stable. I was sure I saw problems even with v6.16 though.
> > >
> > > Let me run more tests.
> >
> > I'm seeing another crash on v6.16 on sun4u even with your patch applied:
> >
> > [ 456.443492] kernel BUG at fs/ext4/inode.c:1174!
> > [ 456.503059] \|/ ____ \|/
> > [ 456.503059] "@'/ .. \`@"
> > [ 456.503059] /_| \__/ |_\
> > [ 456.503059] \__U_/
> > [ 456.696513] apt-get(1217): Kernel bad sw trap 5 [#1]
> > [ 456.761698] CPU: 0 UID: 0 PID: 1217 Comm: apt-get Not tainted 6.16.0+ #24 VOLUNTARY
> > [ 456.863502] TSTATE: 0000000011001601 TPC: 0000000010309250 TNPC: 0000000010309254 Y: 00000000 Not tainted
> > [ 456.992850] TPC: <ext4_block_write_begin+0x450/0x540 [ext4]>
> > [ 457.067500] g0: 0000000000000000 g1: 0000000000000001 g2: 0000000000000000 g3: 0000000000000000
> > [ 457.181869] g4: fff00000141d5c80 g5: 0000000000000008 g6: fff000000be24000 g7: 0000000000000001
> > [ 457.296245] o0: 00000000103944b0 o1: 0000000000000496 o2: ffffffffffffffbf o3: 0000000000101cca
> > [ 457.410618] o4: 0000000000000000 o5: 0000000000000000 sp: fff000000be26fd1 ret_pc: 0000000010309248
> > [ 457.529571] RPC: <ext4_block_write_begin+0x448/0x540 [ext4]>
> > [ 457.604020] l0: fff000003def26e0 l1: 0000000000113cca l2: fff000003def2578 l3: 0000000000000002
> > [ 457.718394] l4: 0000000000000000 l5: 0000000000080000 l6: 0000000000012000 l7: 0000000000000001
> > [ 457.832770] i0: 0000000000000000 i1: 000c00000026b500 i2: 0000000000001000 i3: 0000000000082000
> > [ 457.947146] i4: 00000000103034a0 i5: 0000000000000000 i6: fff000000be270c1 i7: 000000001030c8dc
> > [ 458.061528] I7: <ext4_da_write_begin+0x1bc/0x340 [ext4]>
> > [ 458.131389] Call Trace:
> > [ 458.163408] [<000000001030c8dc>] ext4_da_write_begin+0x1bc/0x340 [ext4]
> > [ 458.250447] [<0000000000674230>] generic_perform_write+0x90/0x240
> > [ 458.330606] [<00000000102f50b4>] ext4_buffered_write_iter+0x54/0x120 [ext4]
> > [ 458.422214] [<00000000102f5624>] ext4_file_write_iter+0x3e4/0x780 [ext4]
> > [ 458.510388] [<0000000000749cc4>] vfs_write+0x2c4/0x3e0
> > [ 458.577867] [<0000000000749f4c>] ksys_write+0x4c/0xe0
> > [ 458.644203] [<0000000000749ff4>] sys_write+0x14/0x40
> > [ 458.709397] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [ 458.786048] Disabling lock debugging due to kernel taint
> > [ 458.855904] Caller[000000001030c8dc]: ext4_da_write_begin+0x1bc/0x340 [ext4]
> > [ 458.948653] Caller[0000000000674230]: generic_perform_write+0x90/0x240
> > [ 459.034430] Caller[00000000102f50b4]: ext4_buffered_write_iter+0x54/0x120 [ext4]
> > [ 459.131761] Caller[00000000102f5624]: ext4_file_write_iter+0x3e4/0x780 [ext4]
> > [ 459.225648] Caller[0000000000749cc4]: vfs_write+0x2c4/0x3e0
> > [ 459.298846] Caller[0000000000749f4c]: ksys_write+0x4c/0xe0
> > [ 459.370900] Caller[0000000000749ff4]: sys_write+0x14/0x40
> > [ 459.441810] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> > [ 459.524168] Caller[0000000000000000]: 0x0
> > [ 459.576772] Instruction DUMP:
> > [ 459.576776] 11040e51
> > [ 459.615662] 7c04b816
> > [ 459.646541] 901220b0
> > [ 459.677418] <91d02005>
> > [ 459.708302] 9735a000
> > [ 459.739181] 95352000
> > [ 459.770076] d25fa7cf
> > [ 459.800945] 7fffe818
> > [ 459.831825] 90100019
> > [ 459.862706]
> > [ 459.941500] systemd[1]: Failed to open /dev/pts device, ignoring: Inappropriate ioctl for device
> > [ 460.063831] systemd[1]: rsyslog.service: Main process exited, code=killed, status=6/ABRT
> > [ 460.170962] systemd[1]: rsyslog.service: Failed with result 'signal'.
> > [ 460.267153] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
> > [ 460.388605] systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
> > [ 460.517346] systemd[1]: Starting rsyslog.service - System Logging Service...
> > [ 460.618299] systemd[1]: Starting systemd-journald.service - Journal Service...
> > [ 460.895645] systemd-journald[1237]: Collecting audit messages is disabled.
> > [ 461.048068] systemd[1]: Failed to open /dev/pts device, ignoring: Inappropriate ioctl for device
> > [ 461.202783] systemd-journald[1237]: File /var/log/journal/9ac90e257b3e423284cfc21a00cbeeb8/system.journal corrupted or uncleanly shut down, renaming and replacing.
> > [ 461.456867] systemd[1]: Started rsyslog.service - System Logging Service.
> > [ 461.616651] systemd-journald[1237]: Time jumped backwards, rotating.
> > [ 461.773305] systemd-journald[1237]: Failed to read journal file /var/log/journal/9ac90e257b3e423284cfc21a00cbeeb8/user-1002.journal for rotation, trying to move it out of the way: Device or
> > resource busy
> > [ 462.065725] systemd[1]: Started systemd-journald.service - Journal Service.
> > [ 462.159895] systemd-journald[1237]: Time jumped backwards, rotating.
> > [ 519.719624] kernel BUG at fs/ext4/inode.c:1174!
> > [ 519.779143] \|/ ____ \|/
> > [ 519.779143] "@'/ .. \`@"
> > [ 519.779143] /_| \__/ |_\
> > [ 519.779143] \__U_/
> > [ 519.972586] apt(1249): Kernel bad sw trap 5 [#2]
> > [ 520.033239] CPU: 0 UID: 0 PID: 1249 Comm: apt Tainted: G D 6.16.0+ #24 VOLUNTARY
> > [ 520.151048] Tainted: [D]=DIE
> > [ 520.188797] TSTATE: 0000000011001603 TPC: 0000000010309250 TNPC: 0000000010309254 Y: 00000000 Tainted: G D
> > [ 520.338725] TPC: <ext4_block_write_begin+0x450/0x540 [ext4]>
> > [ 520.413282] g0: 0000000000000000 g1: 0000000000000001 g2: 0000000000000000 g3: 0000000000000000
> > [ 520.527655] g4: fff00000141d40c0 g5: 000000000000000b g6: fff000000a818000 g7: 0000000000000001
> > [ 520.642031] o0: 00000000103944b0 o1: 0000000000000496 o2: fffffffffffffcc0 o3: 0000000000101cca
> > [ 520.756408] o4: 0000000000000004 o5: 0000000000000000 sp: fff000000a81afd1 ret_pc: 0000000010309248
> > [ 520.875350] RPC: <ext4_block_write_begin+0x448/0x540 [ext4]>
> > [ 520.949799] l0: fff000023439af00 l1: 0000000000113cca l2: fff000023439ad98 l3: 0000000000000002
> > [ 521.064174] l4: 0000000000000000 l5: 0000000000080000 l6: 0000000000012000 l7: 0000000000000001
> > [ 521.178547] i0: 0000000000000000 i1: 000c000000164a00 i2: 0000000000001fc0 i3: 0000000000680000
> > [ 521.292923] i4: 00000000103034a0 i5: 0000000000000000 i6: fff000000a81b0c1 i7: 000000001030c8dc
> > [ 521.407297] I7: <ext4_da_write_begin+0x1bc/0x340 [ext4]>
> > [ 521.477195] Call Trace:
> > [ 521.509295] [<000000001030c8dc>] ext4_da_write_begin+0x1bc/0x340 [ext4]
> > [ 521.596330] [<0000000000674230>] generic_perform_write+0x90/0x240
> > [ 521.676495] [<00000000102f50b4>] ext4_buffered_write_iter+0x54/0x120 [ext4]
> > [ 521.768196] [<00000000102f5624>] ext4_file_write_iter+0x3e4/0x780 [ext4]
> > [ 521.856381] [<0000000000749cc4>] vfs_write+0x2c4/0x3e0
> > [ 521.923957] [<0000000000749f4c>] ksys_write+0x4c/0xe0
> > [ 521.990294] [<0000000000749ff4>] sys_write+0x14/0x40
> > [ 522.055486] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [ 522.132122] Caller[000000001030c8dc]: ext4_da_write_begin+0x1bc/0x340 [ext4]
> > [ 522.224873] Caller[0000000000674230]: generic_perform_write+0x90/0x240
> > [ 522.310649] Caller[00000000102f50b4]: ext4_buffered_write_iter+0x54/0x120 [ext4]
> > [ 522.407974] Caller[00000000102f5624]: ext4_file_write_iter+0x3e4/0x780 [ext4]
> > [ 522.501864] Caller[0000000000749cc4]: vfs_write+0x2c4/0x3e0
> > [ 522.575062] Caller[0000000000749f4c]: ksys_write+0x4c/0xe0
> > [ 522.647118] Caller[0000000000749ff4]: sys_write+0x14/0x40
> > [ 522.718031] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> > [ 522.800380] Caller[0000000000000000]: 0x0
> > [ 522.852991] Instruction DUMP:
> > [ 522.852994] 11040e51
> > [ 522.891878] 7c04b816
> > [ 522.922760] 901220b0
> > [ 522.953638] <91d02005>
> > [ 522.984521] 9735a000
> > [ 523.015401] 95352000
> > [ 523.046284] d25fa7cf
> > [ 523.077163] 7fffe818
> > [ 523.108109] 90100019
> > [ 523.139044]
> >
> > I'll try to bisect this one later this week.
>
> OK, so v6.8 is fine while v6.9 crashes:
>
> [ 39.788224] Unable to handle kernel NULL pointer dereference
> [ 39.862657] tsk->{mm,active_mm}->context = 000000000000004b
> [ 39.935941] tsk->{mm,active_mm}->pgd = fff000000aa98000
> [ 40.004566] \|/ ____ \|/
> [ 40.004566] "@'/ .. \`@"
> [ 40.004566] /_| \__/ |_\
> [ 40.004566] \__U_/
> [ 40.197871] (udev-worker)(88): Oops [#1]
> [ 40.249329] CPU: 0 PID: 88 Comm: (udev-worker) Tainted: P O 6.9.0+ #28
> [ 40.353415] TSTATE: 0000004411001605 TPC: 0000000000df092c TNPC: 0000000000df0930 Y: 00000000 Tainted: P O
> [ 40.502105] TPC: <strlen+0x60/0xd4>
> [ 40.547844] g0: fff000000a3171a1 g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000000001
> [ 40.662224] g4: fff000000aa4dac0 g5: 0000000010000233 g6: fff000000a314000 g7: 0000000000000000
> [ 40.776599] o0: 0000000000000010 o1: 0000000000000010 o2: 0000000001010101 o3: 0000000080808080
> [ 40.890974] o4: 0000000001010000 o5: 0000000000000000 sp: fff000000a317201 ret_pc: 00000000004d4b08
> [ 41.009924] RPC: <module_patient_check_exists.constprop.0+0x48/0x1e0>
> [ 41.094557] l0: fff0000100032f40 l1: 0000000000000000 l2: 0000000000000000 l3: 0000000000000000
> [ 41.208936] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 0000000000000000
> [ 41.323311] i0: 00000001000256d8 i1: 0000000001143000 i2: 0000000001143300 i3: 000000000000000b
> [ 41.437686] i4: 0000000000000010 i5: fffffffffffffff8 i6: fff000000a3172e1 i7: 00000000004d63f0
> [ 41.552062] I7: <load_module+0x550/0x1f00>
> [ 41.605811] Call Trace:
> [ 41.637838] [<00000000004d63f0>] load_module+0x550/0x1f00
> [ 41.708752] [<00000000004d7fac>] init_module_from_file+0x6c/0xa0
> [ 41.787670] [<00000000004d81c8>] sys_finit_module+0x188/0x280
> [ 41.863158] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> [ 41.939790] Caller[00000000004d63f0]: load_module+0x550/0x1f00
> [ 42.016423] Caller[00000000004d7fac]: init_module_from_file+0x6c/0xa0
> [ 42.101059] Caller[00000000004d81c8]: sys_finit_module+0x188/0x280
> [ 42.182266] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> [ 42.264614] Caller[fff000010480e2fc]: 0xfff000010480e2fc
> [ 42.334384] Instruction DUMP:
> [ 42.334387] 96132080
> [ 42.373269] 19004040
> [ 42.404151] 94132101
> [ 42.435030] <da020000>
> [ 42.465914] 9823400a
> [ 42.496793] 808b000b
> [ 42.527674] 024ffffd
> [ 42.558556] 90022004
> [ 42.589437] 8f336018
> [ 42.620318]
>
> So, the regression was introduced with v6.9. Will bisect this later this week.
Hmm, I just ran into another crash on v6.8. The machine didn't crash though:
[ 489.263666] Unable to handle kernel paging request at virtual address 000c000002400000
[ 489.367912] tsk->{mm,active_mm}->context = 00000000000013b2
[ 489.441150] tsk->{mm,active_mm}->pgd = fff000000af04000
[ 489.509872] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[ 489.703156] sshd-session(3671): Oops [#1]
[ 489.755758] CPU: 0 PID: 3671 Comm: sshd-session Not tainted 6.8.0+ #27
[ 489.841544] TSTATE: 0000000811001600 TPC: 000000000065d620 TNPC: 000000000065d624 Y: 00000000 Not tainted
[ 489.970796] TPC: <unmap_page_range+0x620/0xc60>
[ 490.030362] g0: fff000000a939360 g1: 0000000000008800 g2: ffffffffffffffff g3: ffffffffffffffff
[ 490.144748] g4: fff0000000d4a100 g5: 0000000002ad4a68 g6: fff000000a6dc000 g7: 0000010000000000
[ 490.259118] o0: 000c0000024005a0 o1: fff00001018a4000 o2: 0000000100028290 o3: 0000000100028290
[ 490.373493] o4: fff0000001afe71c o5: 0000000001099c00 sp: fff000000a6deeb1 ret_pc: 000000000065d53c
[ 490.492447] RPC: <unmap_page_range+0x53c/0xc60>
[ 490.551915] l0: fff00001018f4000 l1: 0000000100028290 l2: fff000000a6df968 l3: fff00001018a4000
[ 490.666292] l4: fff00000070a25a0 l5: fff000000a6dfaa8 l6: 0000000000000001 l7: 00000000011605a8
[ 490.780668] i0: fff000000a9b0900 i1: fff00001018a6000 i2: fff0000000f99018 i3: fff0000004308290
[ 490.895045] i4: fff00001018f4000 i5: 000c0000024005a0 i6: fff000000a6deff1 i7: 000000000065dcd8
[ 491.009418] I7: <unmap_single_vma.constprop.0+0x78/0xe0>
[ 491.079183] Call Trace:
[ 491.111206] [<000000000065dcd8>] unmap_single_vma.constprop.0+0x78/0xe0
[ 491.198136] [<000000000065dd9c>] unmap_vmas+0x5c/0x1a0
[ 491.265615] [<000000000066a2a4>] exit_mmap+0xc4/0x440
[ 491.331950] [<0000000000463d44>] __mmput+0x44/0x140
[ 491.396003] [<0000000000463e74>] mmput+0x34/0x60
[ 491.456618] [<000000000046a444>] do_exit+0x284/0xaa0
[ 491.521816] [<000000000046ae24>] do_group_exit+0x24/0xa0
[ 491.591584] [<000000000046aebc>] sys_exit_group+0x1c/0x40
[ 491.662496] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
[ 491.739127] Disabling lock debugging due to kernel taint
[ 491.808896] Caller[000000000065dcd8]: unmap_single_vma.constprop.0+0x78/0xe0
[ 491.901543] Caller[000000000065dd9c]: unmap_vmas+0x5c/0x1a0
[ 491.974740] Caller[000000000066a2a4]: exit_mmap+0xc4/0x440
[ 492.046795] Caller[0000000000463d44]: __mmput+0x44/0x140
[ 492.116564] Caller[0000000000463e74]: mmput+0x34/0x60
[ 492.182901] Caller[000000000046a444]: do_exit+0x284/0xaa0
[ 492.253815] Caller[000000000046ae24]: do_group_exit+0x24/0xa0
[ 492.329301] Caller[000000000046aebc]: sys_exit_group+0x1c/0x40
[ 492.405935] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
[ 492.488282] Caller[fff0000102ad4a74]: 0xfff0000102ad4a74
[ 492.558054] Instruction DUMP:
[ 492.558057] c6756010
[ 492.596937] 02ff7fd9
[ 492.627825] c2356018
[ 492.658702] <c25f6008>
[ 492.689581] 8610001d
[ 492.720460] 84086001
[ 492.751343] 82007fff
[ 492.782223] 87789401
[ 492.813105] c258e018
[ 492.894311] Fixing recursive fault but reboot is needed!
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Powered by blists - more mailing lists