[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190507213752.GA24308@tower.DHCP.thefacebook.com>
Date: Tue, 7 May 2019 21:37:58 +0000
From: Roman Gushchin <guro@...com>
To: Qian Cai <cai@....pw>
CC: "oleg@...hat.com" <oleg@...hat.com>,
"tj@...nel.org" <tj@...nel.org>,
"lizefan@...wei.com" <lizefan@...wei.com>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: ptrace warning due to "cgroup: get rid of
cgroup_freezer_frozen_exit()"
On Tue, May 07, 2019 at 04:04:22PM -0400, Qian Cai wrote:
> LTP ptrace01 test case triggers a warning below. Looks at ptrace_stop() calls
> cgroup_enter_frozen() there in the cgroup v2 freezer.
>
> [ 8373.336330] WARNING: CPU: 56 PID: 67026 at kernel/cgroup/cgroup.c:6008
> cgroup_exit+0x2a9/0x2f0
> [ 8373.345001] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop
> nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure dax_pmem irqbypass
> dax_pmem_core efivars ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas
> tg3 mlx5_core libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
> efivarfs [last unloaded: dummy_del_mod]
> [ 8373.375561] CPU: 56 PID: 67026 Comm: ptrace01 Tainted:
> G O 5.1.0-next-20190507+ #25
> [ 8373.384579] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10,
> BIOS A40 01/25/2019
> [ 8373.393164] RIP: 0010:cgroup_exit+0x2a9/0x2f0
> [ 8373.397556] Code: 0d ff ff ff 4c 89 f7 e8 75 4b 1b 00 4c 8b ab 20 0f 00 00 49
> 8d 7d 50 e8 65 4b 1b 00 49 8b 7d 50 e8 4c 56 00 00 e9 db fe ff ff <0f> 0b e9 3a
> fe ff ff 48 01 f1 0f 82 3b ff ff ff 48 c7 c7 40 83 5b
> [ 8373.416443] RSP: 0018:ffff888bdc9ef9b8 EFLAGS: 00010002
> [ 8373.421709] RAX: 0000000000000000 RBX: ffff888e5cfcc040 RCX: ffffffffab3a8e7d
> [ 8373.428897] RDX: 1ffff111cb9f9875 RSI: dffffc0000000000 RDI: ffff888e5cfcc3a8
> [ 8373.436080] RBP: ffff888bdc9efa50 R08: ffffed117b93df25 R09: ffffed117b93df24
> [ 8373.443266] R10: ffffed117b93df24 R11: 0000000000000003 R12: ffff888bdc9efa28
> [ 8373.450451] R13: ffff888f4c2346c8 R14: ffff888e5cfccf60 R15: ffff888e5cfccf68
> [ 8373.457637] FS: 00007ff1e2e3d5c0(0000) GS:ffff88902f800000(0000)
> knlGS:0000000000000000
> [ 8373.465781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8373.471569] CR2: 00007ff1e286fe8a CR3: 000000092c412000 CR4: 00000000001406a0
> [ 8373.478750] Call Trace:
> [ 8373.481219] ? cgroup_post_fork+0x350/0x350
> [ 8373.485435] ? fpu__drop+0x5e/0x230
> [ 8373.488951] ? exit_thread+0x10c/0x160
> [ 8373.492736] do_exit+0x5cb/0x1740
> [ 8373.496083] ? check_chain_key+0x142/0x200
> [ 8373.500210] ? mm_update_next_owner+0x360/0x360
> [ 8373.504775] ? map_id_up+0x14c/0x1f0
> [ 8373.508380] ? check_chain_key+0x142/0x200
> [ 8373.512512] ? get_signal+0x5f1/0xde0
> [ 8373.516206] ? lock_downgrade+0x300/0x300
> [ 8373.520246] ? lock_downgrade+0x300/0x300
> [ 8373.524287] do_group_exit+0x78/0x160
> [ 8373.527978] get_signal+0x1e8/0xde0
> [ 8373.531498] do_signal+0x9c/0x9d0
> [ 8373.534841] ? check_chain_key+0x142/0x200
> [ 8373.538970] ? setup_sigcontext+0x280/0x280
> [ 8373.543185] ? lock_downgrade+0x300/0x300
> [ 8373.547228] ? kill_pid_info+0x2e/0xd0
> [ 8373.551006] ? kill_pid_info+0xa4/0xd0
> [ 8373.554788] ? __x64_sys_kill+0x262/0x350
> [ 8373.558830] exit_to_usermode_loop+0x9d/0xc0
> [ 8373.563131] do_syscall_64+0x470/0x5d8
> [ 8373.566910] ? syscall_return_slowpath+0xf0/0xf0
> [ 8373.571565] ? __do_page_fault+0x44d/0x5b0
> [ 8373.575698] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 8373.580789] RIP: 0033:0x7ff1e2893c3b
> [ 8373.584402] Code: Bad RIP value.
> [ 8373.587653] RSP: 002b:00007ffd8e5efe78 EFLAGS: 00000206 ORIG_RAX:
> 000000000000003e
> [ 8373.595276] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007ff1e2893c3b
> [ 8373.602461] RDX: 0000000000000000 RSI: 000000000000000c RDI: 00000000000105d2
> [ 8373.609642] RBP: 0000000000000000 R08: 00000000ffffffff R09: 00007ff1e2e3d5c0
> [ 8373.616824] R10: fffffffffffff768 R11: 0000000000000206 R12: 00007ffd8e5efe98
> [ 8373.624005] R13: 00007ffd8e5f00c0 R14: 0000000000000000 R15: 0000000000000000
> [ 8373.631190] ---[ end trace a7169f3366f1d100 ]---
Hello, Qian!
Thank you for the report!
Can you, please, try if the following patch fixes the problem?
Thank you!
Roman
--
diff --git a/kernel/signal.c b/kernel/signal.c
index 16b72f4f14df..bf2f268f1386 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2484,9 +2484,6 @@ bool get_signal(struct ksignal *ksig)
sigdelset(¤t->pending.signal, SIGKILL);
recalc_sigpending();
current->jobctl &= ~JOBCTL_TRAP_FREEZE;
- spin_unlock_irq(&sighand->siglock);
- if (unlikely(cgroup_task_frozen(current)))
- cgroup_leave_frozen(true);
goto fatal;
}
@@ -2608,8 +2605,10 @@ bool get_signal(struct ksignal *ksig)
continue;
}
- spin_unlock_irq(&sighand->siglock);
fatal:
+ spin_unlock_irq(&sighand->siglock);
+ if (unlikely(cgroup_task_frozen(current)))
+ cgroup_leave_frozen(true);
/*
* Anything else is fatal, maybe with a core dump.
Powered by blists - more mailing lists