[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de0a7ef1-c2d0-4db4-8267-9d5ac96f0e23@lunn.ch>
Date: Mon, 6 Jan 2025 20:08:59 +0100
From: Andrew Lunn <andrew@...n.ch>
To: John Ousterhout <ouster@...stanford.edu>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, netdev@...r.kernel.org, pabeni@...hat.com,
edumazet@...gle.com, horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v4 12/12] net: homa: create Makefile and Kconfig
On Mon, Jan 06, 2025 at 09:27:24AM -0800, John Ousterhout wrote:
> I have pored over this message for a while and can't figure out how
> Homa code could participate in this deadlock, other than by calling
> hrtimer_init (which is done without holding any locks). If anyone else
> can figure out exactly what this message means and how it relates to
> Homa, I'd love to hear it. Otherwise I'm going to assume it's either a
> false positive or a problem elsewhere in the Linux kernel.
The problem with ignoring these splats is that after the first splat,
you don't get any more. So if Homa does have a real deadlock, you
might never get it reported, you just deadlock.
Have you reproduced this?
> > [ 11.585197][ T133] -> #0 ((console_sem).lock){-...}-{2:2}:
> > [ 11.585197][ T133] check_prev_add (kernel/locking/lockdep.c:3162)
> > [ 11.585197][ T133] validate_chain (kernel/locking/lockdep.c:3281 kernel/locking/lockdep.c:3904)
> > [ 11.585197][ T133] __lock_acquire (kernel/locking/lockdep.c:5226)
> > [ 11.585197][ T133] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814)
> > [ 11.585197][ T133] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
> > [ 11.585197][ T133] down_trylock (kernel/locking/semaphore.c:140)
> > [ 11.585197][ T133] __down_trylock_console_sem (kernel/printk/printk.c:326)
> > [ 11.585197][ T133] console_trylock_spinning (kernel/printk/printk.c:2852 kernel/printk/printk.c:2009)
> > [ 11.585197][ T133] vprintk_emit (kernel/printk/printk.c:2431 kernel/printk/printk.c:2378)
> > [ 11.585197][ T133] vprintk (kernel/printk/printk_safe.c:86)
> > [ 11.585197][ T133] _printk (kernel/printk/printk.c:2452)
> > [ 11.585197][ T133] lookup_object_or_alloc+0x3d4/0x590
> > [ 11.585197][ T133] __debug_object_init (lib/debugobjects.c:744)
> > [ 11.585197][ T133] hrtimer_init (kernel/time/hrtimer.c:456 kernel/time/hrtimer.c:1606)
> > [ 11.585197][ T133] homa_timer_main (net/homa/homa_plumbing.c:971)
> > [ 11.585197][ T133] kthread (kernel/kthread.c:389)
> > [ 11.585197][ T133] ret_from_fork (arch/x86/kernel/process.c:153)
> > [ 11.585197][ T133] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
Do you see something in the console log at this point?
I find it odd that hrtimer_init() results in a console message. Maybe
the console message itself is a clue, there is something wrong with
the timer setup. If you can avoid the console message, you might then
avoid the later lock inversion.
Andrew
Powered by blists - more mailing lists