[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpWf9s_FA6GDjpandwhmnjDbd48xSiNmA8JSP1Tt1Ap9Xw@mail.gmail.com>
Date: Tue, 23 Jun 2020 10:56:13 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: "Zhang,Qiang" <qiang.zhang@...driver.com>
Cc: Roman Gushchin <guro@...com>, Cong Wang <xiyou.wangcong@...il.com>,
Peter Geis <pgwipeout@...il.com>,
Li Zefan <lizefan@...wei.com>,
Cameron Berkenpas <cam@...-zeon.de>,
Daniƫl Sonck <dsonck92@...il.com>,
Lu Fengqi <lufq.fnst@...fujitsu.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Tejun Heo <tj@...nel.org>
Subject: Re: [Patch net] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
On Tue, Jun 23, 2020 at 1:45 AM Zhang,Qiang <qiang.zhang@...driver.com> wrote:
>
> There are some message in kernelv5.4, I don't know if it will help.
>
> demsg:
>
> cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or
> net_cls activation
...
> -----------[ cut here ]-----------
> percpu ref (cgroup_bpf_release_fn) <= 0 (-12) after switching to atomic
> WARNING: CPU: 1 PID: 0 at lib/percpu-refcount.c:161
> percpu_ref_switch_to_atomic_rcu+0x12a/0x140
Yes, this proves we have the refcnt bug which my patch tries to fix.
The negative refcnt is exactly a consequence of the bug, as without
my patch we just put the refcnt without holding it first.
If you can reproduce it, please test my patch:
https://patchwork.ozlabs.org/project/netdev/patch/20200616180352.18602-1-xiyou.wangcong@gmail.com/
But, so far I still don't have a good explanation to the NULL
pointer deref. I think that one is an older bug, and we need to check
for NULL even after we fix the refcnt bug, but I do not know how it is
just exposed recently with Zefan's patch. I am still trying to find an
explanation.
Thanks!
Powered by blists - more mailing lists