[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJE94zTJNKmNivHcy9zK_hyXNnXf2OYH1+HxVNo1m0T=Q@mail.gmail.com>
Date: Tue, 7 Dec 2021 10:34:33 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Xin Long <lucien.xin@...il.com>
Cc: network dev <netdev@...r.kernel.org>, davem <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
Davide Caratti <dcaratti@...hat.com>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next 0/5] net: add refcnt tracking for some common objects
On Tue, Dec 7, 2021 at 10:17 AM Xin Long <lucien.xin@...il.com> wrote:
>
> On Mon, Dec 6, 2021 at 11:41 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Mon, Dec 6, 2021 at 8:02 PM Xin Long <lucien.xin@...il.com> wrote:
> > >
> > > This patchset provides a simple lib(obj_cnt) to count the operatings on any
> > > objects, and saves them into a gobal hashtable. Each node in this hashtable
> > > can be identified with a calltrace and an object pointer. A calltrace could
> > > be a function called from somewhere, like dev_hold() called by:
> > >
> > > inetdev_init+0xff/0x1c0
> > > inetdev_event+0x4b7/0x600
> > > raw_notifier_call_chain+0x41/0x50
> > > register_netdevice+0x481/0x580
> > >
> > > and an object pointer would be the dev that this function is accessing:
> > >
> > > dev_hold(dev).
> > >
> > > When this call comes to this object, a node including calltrace + object +
> > > counter will be created if it doesn't exist, and the counter in this node
> > > will increment if it already exists. Pretty simple.
> > >
> > > So naturally this lib can be used to track the refcnt of any objects, all
> > > it has to do is put obj_cnt_track() to the place where this object is
> > > held or put. It will count how many times this call has operated this
> > > object after checking if this object and this type(hold/put) accessing
> > > are being tracked.
> > >
> > > After the 1st lib patch, the other patches add the refcnt tracking for
> > > netdev, dst, in6_dev and xfrm_state, and each has example how to use
> > > in the changelog. The common use is:
> > >
> > > # sysctl -w obj_cnt.control="clear" # clear the old result
> > >
> > > # sysctl -w obj_cnt.type=0x1 # track type 0x1 operating
> > > # sysctl -w obj_cnt.name=test # match name == test or
> > > # sysctl -w obj_cnt.index=1 # match index == 1
> > > # sysctl -w obj_cnt.nr_entries=4 # save 4 frames' calltrace
> > >
> > > ... (reproduce the issue)
> > >
> > > # sysctl -w obj_cnt.control="scan" # print the new result
> > >
> > > Note that after seeing Eric's another patchset for refcnt tracking I
> > > decided to post this patchset. As in this implemenation, it has some
> > > benefits which I think worth sharing:
> > >
> >
> > How can your code coexist with ref_tracker ?
> Hi, Eric, Thanks for your checking
>
> It won't affect ref_tracker, one can even use both at the same time.
>
> >
> > > - it runs fast:
> > > 1. it doesn't create nodes for the repeatitive calls to the same
> > > objects, and it saves memory and time.
> > > 2. the depth of the calltrace to record is configurable, at most
> > > time small calltrace also saves memory and time, but will not
> > > affect the analysis.
> > > 3. kmem_cache used also contributes to the performance.
> >
> > Points 2/3 can be implemented right away in the ref_tracker infra,
> > please send patches.
> >
> > Quite frankly using a global hash table seems wrong, stack_depot
> > already has this logic, why reimplement it ?
> > stack_depot is damn fast (no spinlock in fast path)
> What this patchset is trying to add is a calltrace+object counter.
> I was looking at stack_depot after seeing you patch, stack_depot saves
> calltrace only, no object(I guess this is okay, I can save object to
> to entries[0] if I want to use it), but also it's not a counter.
>
> I'm not sure if it's allowed to do some change and add a counter to
> the node of stack_depot, like when it's found in saving, the counter
> increments. That will be perfect for this patchset.
>
> This global spinlock will eventually be used only to protect the new
> node's insertion. For the fast path (lookup), rcu_read_lock() will take
> care of it. I haven't got time to add it. but this won't be a problem.
>
> >
> > Seeing that your patches add chunks in lib/obj_cnt.c, I do not see how
> > you can claim this is generic code.
> I planned it as a obj operating counter, it can be used for counting any
> operatings, not just for the refcnt tracker which is only _put and _hold
> operatings.
>
> >
> > I don't know, it seems very strange to send this patch series now I
> > have done about 60 patches on these issues.
> This patch is not to do exactly the same things as your patchset, I think your
> patch saves more information into the objects in the kernel memory, it will
> be useful for vmcore analysis.
>
> This patchset is working in a different way, it's going to target a
> specific object with index or name or pointer matched and some types
> of function calls to it, we have to plan in advance after we know
> which object (like it's name, index or string to match) is leaked.
>
> >
> > And by doing this work, I found already two bugs in our stack.
> Great effects!
> I can see that you must go over all networking stack for dev operations.
>
> >
> > You can be sure syzbot will send us many reports, most syzbot repros
> > use a very limited number of objects.
> >
> > About performance : You use a single spinlock to protect your hash table.
> > In my implementation, there is a spinlock per 'directory (eg one
> > spinlock per struct net_device, one spinlock per struct net), it is
> > more scalable.
> I used per net spinlock at first, but I want to make the code more generic,
> and not only for the network, then I decided to make it not related to net.
>
> After using rcu_lock in the fast path, I think this single spinlock won't
> affect much, besides, this single lock can be replaced by a per hlist lock
> on each hlist_head, it will also save some.
>
> >
> > My tests have not shown a significant cost of the ref_tracker
> > (the major cost comes from stack_trace_save() which you also use)
> I added "run fast" in cover, mostly because it won't create many nodes
> if dev_hold/put are called many times, it only increments the count if it's
> the same call to the same object already existing in the hashtable.
>
> dev could be fine, thinking about tracking dst, when sending packets, dst
> can be hold/put too many times, creating nodes for each call is not a good
> idea, especially for some leak only occurs once for few months which I've
> seen quite a few times in our customer envs.
That is why I chose to not automatically track all dev_hold()/dev_put()
For the ones that are contained in a short function, with clear
contract, there is
no need for adding tracking.
But before taking this decision, I am looking at each case, very carefully.
>
>
> Other things are:
> most net_dev leaks I've met are actually dst leak, some are in6_dev leak,
> only tracking net_dev is not enough to address it, do you have plans to
> add dst track for dst too? That may be a lot of changes too?
Yes, the plan is to add trackers when we think there is a high
probability of having leaks.
I think you missed the one major point of the exercise :
By carefully doing the pairing, we can spot bugs already, even without
turning the CONFIG_ options.
Once this (patient) work done, it is done, the infra will catch future
bugs right away.
>
> I think adding new members into core structures only for debugging
> may not be a good choice, as it will bring troubles to downstream for
> the backport because of kABI.
The main point of this stuff, is to allow syzbot to simply turn on a
few CONFIG options and let
it discover past/new issues. No special sysctls magics.
I see you added introspection, to list all active references, I think
this could be added on a per object
basis, to help developers have ways to better understand the kernel behavior.
Again, I think there is no reason your work can not complement mine.
They serve different purposes.
Powered by blists - more mailing lists