netdev - Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e0387ee2-56b8-c780-8d33-c477a875e2df@roeck-us.net>
Date:   Thu, 9 Jul 2020 12:13:06 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Cameron Berkenpas <cam@...-zeon.de>,
        Peter Geis <pgwipeout@...il.com>,
        Lu Fengqi <lufq.fnst@...fujitsu.com>,
        Daniël Sonck <dsonck92@...il.com>,
        Zhang Qiang <qiang.zhang@...driver.com>,
        Thomas Lamprecht <t.lamprecht@...xmox.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Zefan Li <lizefan@...wei.com>, Tejun Heo <tj@...nel.org>,
        Roman Gushchin <guro@...com>
Subject: Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()

On 7/9/20 11:51 AM, Cong Wang wrote:
> On Thu, Jul 9, 2020 at 10:10 AM Guenter Roeck <linux@...ck-us.net> wrote:
>>
>> Something seems fishy with the use of skcd->val on big endian systems.
>>
>> Some debug output:
>>
>> [   22.643703] sock: ##### sk_alloc(sk=000000001be28100): Calling cgroup_sk_alloc(000000001be28550)
>> [   22.643807] cgroup: ##### cgroup_sk_alloc(skcd=000000001be28550): cgroup_sk_alloc_disabled=0, in_interrupt: 0
>> [   22.643886] cgroup:  #### cgroup_sk_alloc(skcd=000000001be28550): cset->dfl_cgrp=0000000001224040, skcd->val=0x1224040
>> [   22.643957] cgroup: ###### cgroup_bpf_get(cgrp=0000000001224040)
>> [   22.646451] sock: ##### sk_prot_free(sk=000000001be28100): Calling cgroup_sk_free(000000001be28550)
>> [   22.646607] cgroup:  #### sock_cgroup_ptr(skcd=000000001be28550) -> 0000000000014040 [v=14040, skcd->val=14040]
>> [   22.646632] cgroup: ####### cgroup_sk_free(): skcd=000000001be28550, cgrp=0000000000014040
>> [   22.646739] cgroup: ####### cgroup_sk_free(): skcd->no_refcnt=0
>> [   22.646814] cgroup: ####### cgroup_sk_free(): Calling cgroup_bpf_put(cgrp=0000000000014040)
>> [   22.646886] cgroup: ###### cgroup_bpf_put(cgrp=0000000000014040)
> 
> Excellent debugging! I thought it was a double put, but it seems to
> be an endian issue. I didn't realize the bit endian machine actually
> packs bitfields in a big endian way too...
> 
> Does the attached patch address this?
> 

Partially. I don't see the crash anymore, but something is still odd - some of my
tests require a retry with this patch applied, which previously never happened.
I don't know if this is another problem with this patch, or a different problem.
Unfortunately, I'll be unable to debug this further until next Tuesday.

Guenter