netdev - Re: [PATCH net-next] net/sched: fix false lockdep warning on qdisc root lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANn89iK7Nf9WPwg3XkwAMfCqnidtjsB9fSr3025rsUnpuwXJ2w@mail.gmail.com>
Date: Wed, 6 Dec 2023 11:25:52 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Davide Caratti <dcaratti@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>, 
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org, 
	xmu@...hat.com, cpaasch@...le.com
Subject: Re: [PATCH net-next] net/sched: fix false lockdep warning on qdisc
 root lock

On Wed, Dec 6, 2023 at 11:16 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Dec 6, 2023 at 10:04 AM Davide Caratti <dcaratti@...hat.com> wrote:
> >
> > Xiumei and Cristoph reported the following lockdep splat, it complains of
> > the qdisc root being taken twice:
> >
> >  ============================================
> >  WARNING: possible recursive locking detected
> >  6.7.0-rc3+ #598 Not tainted
> >  --------------------------------------------
> >  swapper/2/0 is trying to acquire lock:
> >  ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
> >
> >  but task is already holding lock:
> >  ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
> >
> >  other info that might help us debug this:
> >   Possible unsafe locking scenario:
> >
> >         CPU0
> >         ----
> >    lock(&sch->q.lock);
> >    lock(&sch->q.lock);
> >
> >   *** DEADLOCK ***
> >
> >   May be due to missing lock nesting notation
> >
> >  5 locks held by swapper/2/0:
> >   #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510
> >   #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0
> >   #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70
> >   #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
> >   #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70
> >
> >
>
> Can you add a Fixes: tag ?
>
> Also, what is the interaction with htb_set_lockdep_class_child(), have
> you tried to use HTB after your patch ?
>
> Could htb_set_lockdep_class_child() be removed ?
>
>
> > CC: Xiumei Mu <xmu@...hat.com>
> > Reported-by: Cristoph Paasch <cpaasch@...le.com>
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/451
> > Signed-off-by: Davide Caratti <dcaratti@...hat.com>
> > ---
> >  include/net/sch_generic.h | 1 +
> >  net/sched/sch_generic.c   | 3 +++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index dcb9160e6467..a395ca76066c 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -126,6 +126,7 @@ struct Qdisc {
> >
> >         struct rcu_head         rcu;
> >         netdevice_tracker       dev_tracker;
> > +       struct lock_class_key   root_lock_key;
> >         /* private data */
> >         long privdata[] ____cacheline_aligned;
> >  };
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 8dd0e5925342..da3e1ea42852 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -944,7 +944,9 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
> >         __skb_queue_head_init(&sch->gso_skb);
> >         __skb_queue_head_init(&sch->skb_bad_txq);
> >         gnet_stats_basic_sync_init(&sch->bstats);
> > +       lockdep_register_key(&sch->root_lock_key);
> >         spin_lock_init(&sch->q.lock);
> > +       lockdep_set_class(&sch->q.lock, &sch->root_lock_key);
> >
> >         if (ops->static_flags & TCQ_F_CPUSTATS) {
> >                 sch->cpu_bstats =
> > @@ -1064,6 +1066,7 @@ static void __qdisc_destroy(struct Qdisc *qdisc)
> >         if (ops->destroy)
> >                 ops->destroy(qdisc);
> >
> > +       lockdep_unregister_key(&qdisc->root_lock_key);

lockdep_unregister_key() has a synchronize_rcu() call.

This would slow down qdisc dismantle too much.

I think we need to find another solution to this problem.