[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260107143054.57752fad@pumpkin>
Date: Wed, 7 Jan 2026 14:30:54 +0000
From: David Laight <david.laight.linux@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: John Ogness <john.ogness@...utronix.de>, syzbot
<syzbot+22a26d9b6c0a64335bf7@...kaller.appspotmail.com>,
linux-kernel@...r.kernel.org, rostedt@...dmis.org,
senozhatsky@...omium.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [kernel?] Internal error in div_u64_rem (4)
On Wed, 7 Jan 2026 13:34:52 +0100
Petr Mladek <pmladek@...e.com> wrote:
> On Wed 2026-01-07 10:54:38, John Ogness wrote:
> > On 2025-12-31, syzbot <syzbot+22a26d9b6c0a64335bf7@...kaller.appspotmail.com> wrote:
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: c8ebd433459b Merge tag 'nfsd-6.19-2' of git://git.kernel.o..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15caa7da580000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=e5753ed355722af
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=22a26d9b6c0a64335bf7
> > > compiler: arm-linux-gnueabi-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > userspace arch: arm
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/98a89b9f34e4/non_bootable_disk-c8ebd433.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/fd848bb7d9d0/vmlinux-c8ebd433.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/439934b22d51/zImage-c8ebd433.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+22a26d9b6c0a64335bf7@...kaller.appspotmail.com
> > >
> > > Insufficient stack space to handle exception!
> > > Task stack: [0xeda4c000..0xeda4e000]
> > > IRQ stack: [0xdf804000..0xdf806000]
> > > Overflow stack: [0x830bc000..0x830bd000]
> > > Internal error: kernel stack overflow: 0 [#1] SMP ARM
> > > Modules linked in:
> > > CPU: 1 UID: 0 PID: 2128 Comm: syz-executor Tainted: G L syzkaller #0 PREEMPT
> > > Tainted: [L]=SOFTLOCKUP
> > > Hardware name: ARM-Versatile Express
> > > PC is at div_u64_rem+0x4/0x4c include/linux/math64.h:91
> > > LR is at div_u64 include/linux/math64.h:130 [inline]
> > > LR is at ___update_load_avg kernel/sched/pelt.c:265 [inline]
> > > LR is at __update_load_avg_se+0x150/0x518 kernel/sched/pelt.c:312
> > > pc : [<802abd9c>] lr : [<802b54f0>] psr: 20000193
> > > sp : df804010 ip : df804010 fp : df80406c
> /> > r10: 00000000 r9 : 00000029 r8 : 0000b993
> > > r7 : 85f68c00 r6 : 00000538 r5 : 00000000 r4 : 846c3e80
> > > r3 : df804038 r2 : 0000b993 r1 : 00000000 r0 : 00000000
> > > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> > > Control: 30c5387d Table: 8612bb00 DAC: 00000000
> > > Register r0 information: NULL pointer
> > > Register r1 information: NULL pointer
> > > Register r2 information: non-paged memory
> > > Register r3 information: 2-page vmalloc region starting at 0xdf804000 allocated at start_kernel+0x6b0/0x860 init/main.c:1111
> > > Register r4 information: slab task_struct start 846c3c00 pointer offset 640 size 3072
> > > Register r5 information: NULL pointer
> > > Register r6 information: non-paged memory
> > > Register r7 information: slab task_struct start 85f68c00 pointer offset 0 size 3072
> > > Register r8 information: non-paged memory
> > > Register r9 information: non-paged memory
> > > Register r10 information: NULL pointer
> > > Register r11 information: 2-page vmalloc region starting at 0xdf804000 allocated at start_kernel+0x6b0/0x860 init/main.c:1111
> > > Register r12 information: 2-page vmalloc region starting at 0xdf804000 allocated at start_kernel+0x6b0/0x860 init/main.c:1111
> > > Process syz-executor (pid: 2128, stack limit = 0xeda4c000)
> > > Stack: (0xdf804010 to 0xdf806000)
> > > 4000: 00000018 00000000 9837f050 00000000
> [...]
> > > 44c0: df804594 df8044e0 815cdeec 80203e84 8245bc84 85db88d8 00001501 85f68c00
> ^^^^^^^^
> [...]
> > > 5fc0: 8025be48 8025b9cc df805ffc df805fd8 81aaeb14 8025be44 81abcf40 60000013
> > > 5fe0: ffffffff eda4dbac 82aed0d0 85f68c00 eda4db74 df806000 81a7eaa8 81aaeaa4
> > > Call trace: frame pointer underflow
> [...]
> > > [<802e463c>] (vprintk) from [<80203ea8>] (_printk+0x34/0x58 kernel/printk/printk.c:2451)
> > > [<80203e74>] (_printk) from [<815cdeec>] (__dev_queue_xmit+0xcb4/0x1244 net/core/dev.c:4834)
> ^^^^^^^^
>
> I wanted to double check whether printk() was responsible for
> eating the stack. It might use some buffers somewhere...
>
> If I get it correctly then "815cdeec" is the return
> address for printk(). And if I cound it correctly then printk was
> called when almost 7k from the 8k stack has already been used:
>
> stack size: 0x6000-0x4000 = 0x2000 = 8192 = 8k
> printk at: 0x6000-0x44c0 = 0x1b40 = 6976
> remaining: 0x44c0-0x4000 = 0x4c0 = 1216
>
> My conclusion is that printk() is _not_ the sinner here.
Did you look at the stack offset for the IPI?
That would show how much printk() was using.
(Can someone fix the traceback to include the stack pointers?
The address where the link register is stored would do.)
Even without recursion I suspect there are printk() calls well down the stack.
A call at 7k (of 8k) could be quite common.
Especially since they can happen for unusual error conditions.
Some of the $px formats probably use a lot of stack.
In this case the 'softint' callbacks are also running.
They can run anywhere where interrupts are enabled - and usually run
at exactly the same place as the hardware interrupt that requested
the callback.
I believe there is a stack switch for hardware interrupts.
Is there a stack switch for softints?
Is there a stack switch for IPIs?
So did the process stack or the per-cpu interrupt stack overflow.
Is there scope for a conditional stack switch for things like
printk()?
David
>
> > > r3:85f68c00 r2:00001501 r1:85db88d8 r0:8245bc84
> >
> > Note that net/core/dev.c:4834 from __dev_queue_xmit() is:
> >
> > /* Recursion is detected! It is possible,
> > * unfortunately
> > */
> > recursion_alert:
> > net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
> > dev->name);
> >
>
> Yeah, this seems to be the culprit. If I get it correctly then
> "81888f18" is the return address (__dev_queue_xmit) and I see
> it repeated more times on the stack...
>
> > > [<815cd238>] (__dev_queue_xmit) from [<81888f18>] (dev_queue_xmit include/linux/netdevice.h:3381 [inline])
> ^^^^^^^^^
> > > [<815cd238>] (__dev_queue_xmit) from [<81888f18>] (neigh_hh_output include/net/neighbour.h:540 [inline])
>
> Best Regards,
> Petr
>
Powered by blists - more mailing lists