[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200708153327.GA193647@roeck-us.net>
Date: Wed, 8 Jul 2020 08:33:27 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: netdev@...r.kernel.org, Cameron Berkenpas <cam@...-zeon.de>,
Peter Geis <pgwipeout@...il.com>,
Lu Fengqi <lufq.fnst@...fujitsu.com>,
Daniƫl Sonck <dsonck92@...il.com>,
Zhang Qiang <qiang.zhang@...driver.com>,
Thomas Lamprecht <t.lamprecht@...xmox.com>,
Daniel Borkmann <daniel@...earbox.net>,
Zefan Li <lizefan@...wei.com>, Tejun Heo <tj@...nel.org>,
Roman Gushchin <guro@...com>
Subject: Re: [Patch net v2] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
Hi,
On Thu, Jul 02, 2020 at 11:52:56AM -0700, Cong Wang wrote:
> When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
> copied, so the cgroup refcnt must be taken too. And, unlike the
> sk_alloc() path, sock_update_netprioidx() is not called here.
> Therefore, it is safe and necessary to grab the cgroup refcnt
> even when cgroup_sk_alloc is disabled.
>
> sk_clone_lock() is in BH context anyway, the in_interrupt()
> would terminate this function if called there. And for sk_alloc()
> skcd->val is always zero. So it's safe to factor out the code
> to make it more readable.
>
> The global variable 'cgroup_sk_alloc_disabled' is used to determine
> whether to take these reference counts. It is impossible to make
> the reference counting correct unless we save this bit of information
> in skcd->val. So, add a new bit there to record whether the socket
> has already taken the reference counts. This obviously relies on
> kmalloc() to align cgroup pointers to at least 4 bytes,
> ARCH_KMALLOC_MINALIGN is certainly larger than that.
>
> This bug seems to be introduced since the beginning, commit
> d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
> tried to fix it but not compeletely. It seems not easy to trigger until
> the recent commit 090e28b229af
> ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
>
This patch causes all my s390 boot tests to crash. Reverting it fixes
the problem. Please see bisect results and and crash log below.
Guenter
---
bisect results (from pending-fixes branch) in -next repository):
# bad: [1432f824c2db44ef35b26caa9f81dd05211a75fc] Merge remote-tracking branch 'drm-misc-fixes/for-linux-next-fixes'
# good: [dcb7fd82c75ee2d6e6f9d8cc71c52519ed52e258] Linux 5.8-rc4
git bisect start 'HEAD' 'v5.8-rc4'
# bad: [fe12f8184e7265e2d24e5ed5b255275dfe4c1c04] Merge remote-tracking branch 'net/master'
git bisect bad fe12f8184e7265e2d24e5ed5b255275dfe4c1c04
# good: [474112d57c70520ebd81a5ca578fee1d93fafd07] Documentation: networking: ipvs-sysctl: drop doubled word
git bisect good 474112d57c70520ebd81a5ca578fee1d93fafd07
# good: [6d12075ddeedc38d25c5b74e929e686158da728c] Merge tag 'mtd/fixes-for-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
git bisect good 6d12075ddeedc38d25c5b74e929e686158da728c
# good: [74478ea4ded519db35cb1f059948b1e713bb4abf] net: ipa: fix QMI structure definition bugs
git bisect good 74478ea4ded519db35cb1f059948b1e713bb4abf
# bad: [9c29e36152748fd623fcff6cc8f538550f9eeafc] mptcp: fix DSS map generation on fin retransmission
git bisect bad 9c29e36152748fd623fcff6cc8f538550f9eeafc
# good: [aea23c323d89836bcdcee67e49def997ffca043b] ipv6: Fix use of anycast address with loopback
git bisect good aea23c323d89836bcdcee67e49def997ffca043b
# bad: [28b18e4eb515af7c6661c3995c6e3c34412c2874] net: sky2: initialize return of gm_phy_read
git bisect bad 28b18e4eb515af7c6661c3995c6e3c34412c2874
# bad: [ad0f75e5f57ccbceec13274e1e242f2b5a6397ed] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
git bisect bad ad0f75e5f57ccbceec13274e1e242f2b5a6397ed
# first bad commit: [ad0f75e5f57ccbceec13274e1e242f2b5a6397ed] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
---
Crash log:
[ 22.390674] Run /sbin/init as init process
[ 22.497551] Unable to handle kernel pointer dereference in virtual kernel address space
[ 22.497738] Failing address: 5010f0b45fa93000 TEID: 5010f0b45fa93803
[ 22.497813] Fault in home space mode while using kernel ASCE.
[ 22.497958] AS:0000000001774007 R3:0000000000000024
[ 22.498300] Oops: 0038 ilc:3 [#1] SMP
[ 22.498405] Modules linked in:
[ 22.499027] CPU: 0 PID: 153 Comm: init Not tainted 5.8.0-rc4-00328-g1432f824c2db4 #1
[ 22.499112] Hardware name: QEMU 2964 QEMU (KVM/Linux)
[ 22.499261] Krnl PSW : 0704e00180000000 0000000000259be0 (cgroup_sk_free+0xa8/0x1e8)
[ 22.499405] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 22.499506] Krnl GPRS: 0000000048a38585 5010f0b45fa93094 0000000000000002 000000001c228bd8
[ 22.499585] 000000001c228bb0 0000000000000000 0000000000000000 00000000011c2eda
[ 22.499665] fffffffffffff000 00000000011c1f72 00000000011deef0 0000000000014040
[ 22.499744] 000000001c228100 0000000000e76bf0 0000000000259c82 000003e0002c3c00
[ 22.500270] Krnl Code: 0000000000259bd2: a72a0001 ahi %r2,1
[ 22.500270] 0000000000259bd6: 502003a8 st %r2,936
[ 22.500270] #0000000000259bda: e31003b80008 ag %r1,952
[ 22.500270] >0000000000259be0: e32010000004 lg %r2,0(%r1)
[ 22.500270] 0000000000259be6: a7f40004 brc 15,0000000000259bee
[ 22.500270] 0000000000259bea: b9040023 lgr %r2,%r3
[ 22.500270] 0000000000259bee: b9040032 lgr %r3,%r2
[ 22.500270] 0000000000259bf2: b9040042 lgr %r4,%r2
[ 22.500635] Call Trace:
[ 22.500748] [<0000000000259be0>] cgroup_sk_free+0xa8/0x1e8
[ 22.500835] ([<0000000000259bb4>] cgroup_sk_free+0x7c/0x1e8)
[ 22.500914] [<0000000000b24e16>] __sk_destruct+0x196/0x260
[ 22.500999] [<0000000000cadc18>] unix_release_sock+0x358/0x460
[ 22.501073] [<0000000000cadd5a>] unix_release+0x3a/0x60
[ 22.501149] [<0000000000b1a63a>] __sock_release+0x62/0xf8
[ 22.501223] [<0000000000b1a6f8>] sock_close+0x28/0x38
[ 22.501299] [<000000000045101e>] __fput+0x126/0x2a8
[ 22.501374] [<000000000017e088>] task_work_run+0x78/0xc8
[ 22.501449] [<000000000010a596>] do_notify_resume+0x9e/0xa8
[ 22.501526] [<0000000000de555a>] system_call+0xe6/0x2d4
[ 22.501657] INFO: lockdep is turned off.
[ 22.501736] Last Breaking-Event-Address:
[ 22.501814] [<0000000000259c86>] cgroup_sk_free+0x14e/0x1e8
[ 22.502169] Kernel panic - not syncing: Fatal exception: panic_on_oops
Powered by blists - more mailing lists