[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB7PR04MB46204E4A3EBD5DD665F492D38BAA0@DB7PR04MB4620.eurprd04.prod.outlook.com>
Date: Wed, 21 Aug 2019 07:37:25 +0000
From: Vakul Garg <vakul.garg@....com>
To: Florian Westphal <fw@...len.de>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: Help needed - Kernel lockup while running ipsec
> -----Original Message-----
> From: Vakul Garg
> Sent: Tuesday, August 20, 2019 4:08 PM
> To: Florian Westphal <fw@...len.de>
> Cc: netdev@...r.kernel.org
> Subject: RE: Help needed - Kernel lockup while running ipsec
>
>
>
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@...len.de>
> > > Sent: Tuesday, August 20, 2019 3:08 PM
> > > To: Vakul Garg <vakul.garg@....com>
> > > Cc: Florian Westphal <fw@...len.de>; netdev@...r.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@....com> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Florian Westphal <fw@...len.de>
> > > > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > > > To: Vakul Garg <vakul.garg@....com>
> > > > > Cc: Florian Westphal <fw@...len.de>; netdev@...r.kernel.org
> > > > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > > > >
> > > > > Vakul Garg <vakul.garg@....com> wrote:
> > > > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > > > running single
> > > > > > > static ipsec tunnel.
> > > > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > > > ipsec encap
> > > > > > > test (on my dual core arm board).
> > > > > > > >
> > > > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > > > policy in variable
> > > > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > > > This creates an infinite loop in xfrm_policy_lookup_bytype()
> > > > > > > > and hence the
> > > > > > > lockup.
> > > > > > > >
> > > > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > > > condition can it
> > > > > > > become '0'?
> > > > > > >
> > > > > > > Yes, when policy is destroyed and the last user calls
> > > > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > > > >
> > > > > > It seems that policy reference count never gets decremented during
> > > > > > packet
> > > > > ipsec encap.
> > > > > > It is getting incremented for every frame that hits the policy.
> > > > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > > > >
> > > > > Thats a bug. Does this affect 4.14 only or does this happen on
> > > > > current tree as well?
> > > >
> > > > I am yet to try it on 4.19.
> > > > Can you help me with the right fix? Which part of code should it get
> > > decremented?
> > > > I am not conversant with xfrm code.
> > >
> > > Normally policy reference counts get decremented when the skb is
> free'd,
> > via
> > > dst destruction (xfrm_dst_destroy()).
> > >
> > > Do you see a dst leak as well?
> >
> > Can you please guide me how to detect it?
> >
> > (I am checking refcount on recent kernel and will let you know.)
>
> Policy refcount is decreasing properly on 4.19.
> Same should be on the latest kernel too.
On kernel-4.14, I find dst_release() is getting called through xfrm_output_one().
However since dst->__refcnt gets decremented to '1',
the call_rcu(&dst->rcu_head, dst_destroy_rcu) is not invoked.
On kernel-4.19, dst->__refcnt gets decremented to '0', hence things fall in place and
dst_destroy_rcu() eventually executes.
Any further help/pointers for kernel-4.14 would be deeply appreciated.
Powered by blists - more mailing lists