lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211208180501.11969-1-mkoutny@suse.com>
Date:   Wed,  8 Dec 2021 19:05:01 +0100
From:   Michal Koutný <mkoutny@...e.com>
To:     linux-kernel@...r.kernel.org
Cc:     "Eric W . Biederman" <ebiederm@...ssion.com>,
        Jens Axboe <axboe@...nel.dk>,
        Kees Cook <keescook@...omium.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jim Newsome <jnewsome@...project.org>,
        Alexey Gladkov <legion@...nel.org>, Tejun Heo <tj@...nel.org>
Subject: [PATCH] exit: Retain nsproxy for exit_task_work() work entries

The reported issue is an attempted write in a cgroup file, by a zombie
who has/had acct(2) enabled. Such a write needs cgroup_ns for access
checking. Ordinary acct_process() would not be affected by this as it is
called well before exit_task_namespaces(). However, the reported NULL
dereference is a different acct data writer:

	Call Trace:
	 <TASK>
	 kernfs_fop_write_iter+0x3b6/0x510 fs/kernfs/file.c:296
	 __kernel_write+0x5d1/0xaf0 fs/read_write.c:535
	 do_acct_process+0x112a/0x17b0 kernel/acct.c:518
	 acct_pin_kill+0x27/0x130 kernel/acct.c:173
	 pin_kill+0x2a6/0x940 fs/fs_pin.c:44
	 mnt_pin_kill+0xc1/0x170 fs/fs_pin.c:81
	 cleanup_mnt+0x4bc/0x510 fs/namespace.c:1130
	 task_work_run+0x146/0x1c0 kernel/task_work.c:164
	 exit_task_work include/linux/task_work.h:32 [inline]
	 do_exit+0x705/0x24f0 kernel/exit.c:832
	 do_group_exit+0x168/0x2d0 kernel/exit.c:929
	 get_signal+0x16b0/0x2090 kernel/signal.c:2820
	 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
	 handle_signal_work kernel/entry/common.c:148 [inline]
	 exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
	 exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
	 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
	 syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
	 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

i.e. called as one of task_work_run() entries.

The historical commit 8aac62706ada ("move exit_task_namespaces() outside
of exit_notify()") argues that exit_task_namespaces() must come before
exit_task_work() because ipc_ns cleanup calls fput/task_work_add.

There is accompanying commit e7b2c4069252 ("fput: task_work_add() can
fail if the caller has passed exit_task_work()") in the original series
that makes fput() robust in situations when task_work_add() cannot be
used anymore.

So in order to ensure that task_work_run() entries of the exiting task
have the nsproxy still available, swap the order of
exit_task_namespaces() and exit_task_work().

This change may appear like a partial revert of 8aac62706ada but this
particular ordering change shouldn't matter with the fix from
e7b2c4069252 and the other reason for 8aac62706ada (keeping exit_notify
simpler) still holds.

Reported-by: syzbot+50f5cf33a284ce738b62@...kaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Cc: Oleg Nesterov <oleg@...hat.com>
Fixes: 5136f6365ce3 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: Michal Koutný <mkoutny@...e.com>
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I wasn't able to reproduce the syzbot's crash manually so the effectiveness of
the fix is only based on the reasoning.

diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686..0c2abdebb87c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -828,8 +828,8 @@ void __noreturn do_exit(long code)
 	exit_fs(tsk);
 	if (group_dead)
 		disassociate_ctty(1);
-	exit_task_namespaces(tsk);
 	exit_task_work(tsk);
+	exit_task_namespaces(tsk);
 	exit_thread(tsk);
 
 	/*
-- 
2.33.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ