[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpW6dW0avuhKhifuvHYYzsoC7Na6JA4UPzjnPGqSDgzE3w@mail.gmail.com>
Date: Thu, 9 Jul 2020 12:19:40 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Cameron Berkenpas <cam@...-zeon.de>,
Peter Geis <pgwipeout@...il.com>,
Lu Fengqi <lufq.fnst@...fujitsu.com>,
Daniƫl Sonck <dsonck92@...il.com>,
Zhang Qiang <qiang.zhang@...driver.com>,
Thomas Lamprecht <t.lamprecht@...xmox.com>,
Daniel Borkmann <daniel@...earbox.net>,
Zefan Li <lizefan@...wei.com>, Tejun Heo <tj@...nel.org>,
Roman Gushchin <guro@...com>
Subject: Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
On Thu, Jul 9, 2020 at 12:13 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 7/9/20 11:51 AM, Cong Wang wrote:
> > On Thu, Jul 9, 2020 at 10:10 AM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >> Something seems fishy with the use of skcd->val on big endian systems.
> >>
> >> Some debug output:
> >>
> >> [ 22.643703] sock: ##### sk_alloc(sk=000000001be28100): Calling cgroup_sk_alloc(000000001be28550)
> >> [ 22.643807] cgroup: ##### cgroup_sk_alloc(skcd=000000001be28550): cgroup_sk_alloc_disabled=0, in_interrupt: 0
> >> [ 22.643886] cgroup: #### cgroup_sk_alloc(skcd=000000001be28550): cset->dfl_cgrp=0000000001224040, skcd->val=0x1224040
> >> [ 22.643957] cgroup: ###### cgroup_bpf_get(cgrp=0000000001224040)
> >> [ 22.646451] sock: ##### sk_prot_free(sk=000000001be28100): Calling cgroup_sk_free(000000001be28550)
> >> [ 22.646607] cgroup: #### sock_cgroup_ptr(skcd=000000001be28550) -> 0000000000014040 [v=14040, skcd->val=14040]
> >> [ 22.646632] cgroup: ####### cgroup_sk_free(): skcd=000000001be28550, cgrp=0000000000014040
> >> [ 22.646739] cgroup: ####### cgroup_sk_free(): skcd->no_refcnt=0
> >> [ 22.646814] cgroup: ####### cgroup_sk_free(): Calling cgroup_bpf_put(cgrp=0000000000014040)
> >> [ 22.646886] cgroup: ###### cgroup_bpf_put(cgrp=0000000000014040)
> >
> > Excellent debugging! I thought it was a double put, but it seems to
> > be an endian issue. I didn't realize the bit endian machine actually
> > packs bitfields in a big endian way too...
> >
> > Does the attached patch address this?
> >
>
> Partially. I don't see the crash anymore, but something is still odd - some of my
> tests require a retry with this patch applied, which previously never happened.
> I don't know if this is another problem with this patch, or a different problem.
> Unfortunately, I'll be unable to debug this further until next Tuesday.
Make sure you test the second patch I sent, not the first one. The first one
is still incomplete and ugly too. The two bits must be the last two,
so correcting
the if test is not sufficient, we have to fix the whole bitfield packing.
Thanks!
Powered by blists - more mailing lists