lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0031e545f7f26a36a213712480ed6d157d0fc47a.camel@gmail.com>
Date:   Thu, 12 Jan 2023 09:48:40 -0800
From:   Alexander H Duyck <alexander.duyck@...il.com>
To:     Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        "Y . b . Lu" <yangbo.lu@....com>
Subject: Re: [PATCH net] net: enetc: avoid deadlock in
 enetc_tx_onestep_tstamp()

On Thu, 2023-01-12 at 12:54 +0200, Vladimir Oltean wrote:
> This lockdep splat says it better than I could:
> 
> ================================
> WARNING: inconsistent lock state
> 6.2.0-rc2-07010-ga9b9500ffaac-dirty #967 Not tainted
> --------------------------------
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
> ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0
> {IN-SOFTIRQ-W} state was registered at:
>   _raw_spin_lock+0x5c/0xc0
>   sch_direct_xmit+0x148/0x37c
>   __dev_queue_xmit+0x528/0x111c
>   ip6_finish_output2+0x5ec/0xb7c
>   ip6_finish_output+0x240/0x3f0
>   ip6_output+0x78/0x360
>   ndisc_send_skb+0x33c/0x85c
>   ndisc_send_rs+0x54/0x12c
>   addrconf_rs_timer+0x154/0x260
>   call_timer_fn+0xb8/0x3a0
>   __run_timers.part.0+0x214/0x26c
>   run_timer_softirq+0x3c/0x74
>   __do_softirq+0x14c/0x5d8
>   ____do_softirq+0x10/0x20
>   call_on_irq_stack+0x2c/0x5c
>   do_softirq_own_stack+0x1c/0x30
>   __irq_exit_rcu+0x168/0x1a0
>   irq_exit_rcu+0x10/0x40
>   el1_interrupt+0x38/0x64
> irq event stamp: 7825
> hardirqs last  enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130
> hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8
> softirqs last  enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8
> softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(_xmit_ETHER#2);
>   <Interrupt>
>     lock(_xmit_ETHER#2);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/1:3/179:
>  #0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
>  #1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0
>  #2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34
> 
> Workqueue: events enetc_tx_onestep_tstamp
> Call trace:
>  print_usage_bug.part.0+0x208/0x22c
>  mark_lock+0x7f0/0x8b0
>  __lock_acquire+0x7c4/0x1ce0
>  lock_acquire.part.0+0xe0/0x220
>  lock_acquire+0x68/0x84
>  _raw_spin_lock+0x5c/0xc0
>  netif_freeze_queues+0x5c/0xc0
>  netif_tx_lock+0x24/0x34
>  enetc_tx_onestep_tstamp+0x20/0x100
>  process_one_work+0x28c/0x6c0
>  worker_thread+0x74/0x450
>  kthread+0x118/0x11c
> 
> but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in
> process context, therefore with softirqs enabled (i.o.w., it can be
> interrupted by a softirq). If we hold the netif_tx_lock() when there is
> an interrupt, and the NET_TX softirq then gets scheduled, this will take
> the netif_tx_lock() a second time and deadlock the kernel.
> 
> To solve this, use netif_tx_lock_bh(), which blocks softirqs from
> running.
> 
> Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
>  drivers/net/ethernet/freescale/enetc/enetc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 5ad0b259e623..0a990d35fe58 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -2288,14 +2288,14 @@ static void enetc_tx_onestep_tstamp(struct work_struct *work)
>  
>  	priv = container_of(work, struct enetc_ndev_priv, tx_onestep_tstamp);
>  
> -	netif_tx_lock(priv->ndev);
> +	netif_tx_lock_bh(priv->ndev);
>  
>  	clear_bit_unlock(ENETC_TX_ONESTEP_TSTAMP_IN_PROGRESS, &priv->flags);
>  	skb = skb_dequeue(&priv->tx_skbs);
>  	if (skb)
>  		enetc_start_xmit(skb, priv->ndev);
>  
> -	netif_tx_unlock(priv->ndev);
> +	netif_tx_unlock_bh(priv->ndev);
>  }
>  
>  static void enetc_tx_onestep_tstamp_init(struct enetc_ndev_priv *priv)


Looking at the patch this fixes I had a question. You have the tx_skbs
in the enet_ndev_priv struct and from what I can tell it looks like you
support multiple Tx queues. Is there a risk of corrupting the queue if
multiple Tx queues attempt to request the onestep timestamp?

My thought is that you might be better off looking at splitting your
queues up so that they are contained within the enetc_bdr struct. Then
you would only need the individual Tx queue lock instead of having to
take the global Tx queue lock.

Also I am confused. Why do you clear the TSTAMP_IN_PROGRESS flag in
enetc_tx_onestep_timestamp before checking the state of the queue? It
seems like something you should only be clearing once the queue is
empty.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ