[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21493d3d08936d7ed67f7153cdaa418e@www.loen.fr>
Date: Wed, 06 Nov 2019 13:25:56 +0109
From: Marc Zyngier <maz@...nel.org>
To: Salil Mehta <salil.mehta@...wei.com>
Cc: <edumazet@...gle.com>, <netdev@...r.kernel.org>,
"lipeng (Y)" <lipeng321@...wei.com>,
"Zhuangyuzeng (Yisen)" <yisen.zhuang@...wei.com>,
"David S . Miller" <davem@...emloft.net>,
Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH] net: hns: Ensure that interface teardown cannot race with TX interrupt
On 2019-11-06 12:28, Salil Mehta wrote:
> Hi Marc,
>
>> On Tue, 5 Nov 2019 18:41:11 +0000
>> Salil Mehta <salil.mehta@...wei.com> wrote:
>>
>> Hi Salil,
>>
>> > Hi Marc,
>> > I tested with the patch on D05 with the lockdep enabled kernel
>> with below options
>> > and I could not reproduce the deadlock. I do not argue the issue
>> being mentioned
>> > as this looks to be a clear bug which should hit while TX
>> data-path is running
>> > and we try to disable the interface.
>>
>> Lockdep screaming at you doesn't mean the deadly scenario happens in
>> practice, and I've never seen the machine hanging in these
>> conditions.
>> But I've also never tried to trigger it in anger.
>>
>> > Could you please help me know the exact set of steps you used to
>> get into this
>> > problem. Also, are you able to re-create it easily/frequently?
>>
>> I just need to issue "reboot" (which calls "ip link ... down") for
>> this
>> to trigger. Here's a full splat[1], as well as my full config[2]. It
>> is
>> 100% repeatable.
>>
>> > # Kernel Config options:
>> > CONFIG_LOCKDEP_SUPPORT=y
>> > CONFIG_LOCKDEP=y
>>
>> You'll need at least
>>
>> CONFIG_PROVE_LOCKING=y
>> CONFIG_NET_POLL_CONTROLLER=y
>
>
> Few points:
> 1. To me netpoll causing spinlock deadlock with IRQ leg of TX and ip
> util is
> highly unlikely since netpoll runs with both RX/TX interrupts
> disabled.
> It runs in polling mode to facilitate parallel path to features
> like
> Netconsole, netdump etc. hence, deadlock because of the netpoll
> should
> be highly unlikely. Therefore, smells of some other problem
> here...
> 2. Also, I remember patch[s1][s2] from Eric Dumazet to disable
> netpoll on many
> NICs way back in 4.19 kernel on the basis of Song Liu's findings.
>
> Problem:
> Aah, I see the problem now, it is because of the stray code related
> to the
> NET_POLL_CONTROLLER in hns driver which actually should have got
> remove within
> the patch[s1], and that also explains why it does not get hit while
> NET POLL
> is disabled.
>
>
> /* netif_tx_lock will turn down the performance, set only when
> necessary */
> #ifdef CONFIG_NET_POLL_CONTROLLER
> #define NETIF_TX_LOCK(ring) spin_lock(&(ring)->lock)
> #define NETIF_TX_UNLOCK(ring) spin_unlock(&(ring)->lock)
> #else
> #define NETIF_TX_LOCK(ring)
> #define NETIF_TX_UNLOCK(ring)
> #endif
>
>
> Once you define CONFIG_NET_POLL_CONTROLLER in the latest code these
> macros
> Kick-in even for the normal NAPI path. Which can cause deadlock and
> that perhaps
> is what you are seeing?
Yes, that's the problem.
> Now, the question is do we require these locks in normal NAPI poll? I
> do not
> see that we need them anymore as Tasklets are serialized to
> themselves and
> configuration path like "ip <intf> down" cannot conflict with NAPI
> poll path
> as the later is always disabled prior performing interface down
> operation.
> Hence, no conflict there.
My preference would indeed be to drop these per-queue locks if they
aren't
required. I couldn't figure out from a cursory look at the code whether
two CPUs could serve the same TX queue. If that cannot happen by
construction,
then these locks are perfectly useless and should be removed.
>
> As a side analysis, I could figure out some contentions in the
> configuration
> path not related to this though. :)
>
>
> Suggested Solution:
> Since we do not have support of NET_POLL_CONTROLLER macros
> NETIF_TX_[UN]LOCK
> We should remove these NET_POLL_CONTROLLER macros altogether for now.
>
> Though, I still have not looked comprehensively how other are able to
> use
> Debugging utils like netconsole etc without having
> NET_POLL_CONTROLLER.
> Maybe @Eric Dumazet might give us some insight on this?
>
>
> If you agree with this then I can send a patch to remove these from
> hns
> driver. This should solve your problem as well?
Sure, as long as you can guarantee that these locks are never used for
anything
useful.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists