netdev - Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpW6dW0avuhKhifuvHYYzsoC7Na6JA4UPzjnPGqSDgzE3w@mail.gmail.com>
Date:   Thu, 9 Jul 2020 12:19:40 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Cameron Berkenpas <cam@...-zeon.de>,
        Peter Geis <pgwipeout@...il.com>,
        Lu Fengqi <lufq.fnst@...fujitsu.com>,
        Daniël Sonck <dsonck92@...il.com>,
        Zhang Qiang <qiang.zhang@...driver.com>,
        Thomas Lamprecht <t.lamprecht@...xmox.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Zefan Li <lizefan@...wei.com>, Tejun Heo <tj@...nel.org>,
        Roman Gushchin <guro@...com>
Subject: Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()

On Thu, Jul 9, 2020 at 12:13 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 7/9/20 11:51 AM, Cong Wang wrote:
> > On Thu, Jul 9, 2020 at 10:10 AM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >> Something seems fishy with the use of skcd->val on big endian systems.
> >>
> >> Some debug output:
> >>
> >> [   22.643703] sock: ##### sk_alloc(sk=000000001be28100): Calling cgroup_sk_alloc(000000001be28550)
> >> [   22.643807] cgroup: ##### cgroup_sk_alloc(skcd=000000001be28550): cgroup_sk_alloc_disabled=0, in_interrupt: 0
> >> [   22.643886] cgroup:  #### cgroup_sk_alloc(skcd=000000001be28550): cset->dfl_cgrp=0000000001224040, skcd->val=0x1224040
> >> [   22.643957] cgroup: ###### cgroup_bpf_get(cgrp=0000000001224040)
> >> [   22.646451] sock: ##### sk_prot_free(sk=000000001be28100): Calling cgroup_sk_free(000000001be28550)
> >> [   22.646607] cgroup:  #### sock_cgroup_ptr(skcd=000000001be28550) -> 0000000000014040 [v=14040, skcd->val=14040]
> >> [   22.646632] cgroup: ####### cgroup_sk_free(): skcd=000000001be28550, cgrp=0000000000014040
> >> [   22.646739] cgroup: ####### cgroup_sk_free(): skcd->no_refcnt=0
> >> [   22.646814] cgroup: ####### cgroup_sk_free(): Calling cgroup_bpf_put(cgrp=0000000000014040)
> >> [   22.646886] cgroup: ###### cgroup_bpf_put(cgrp=0000000000014040)
> >
> > Excellent debugging! I thought it was a double put, but it seems to
> > be an endian issue. I didn't realize the bit endian machine actually
> > packs bitfields in a big endian way too...
> >
> > Does the attached patch address this?
> >
>
> Partially. I don't see the crash anymore, but something is still odd - some of my
> tests require a retry with this patch applied, which previously never happened.
> I don't know if this is another problem with this patch, or a different problem.
> Unfortunately, I'll be unable to debug this further until next Tuesday.

Make sure you test the second patch I sent, not the first one. The first one
is still incomplete and ugly too. The two bits must be the last two,
so correcting
the if test is not sufficient, we have to fix the whole bitfield packing.

Thanks!