linux-kernel - Re: [syzbot] WARNING: locking bug in umh

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp>
Date:   Fri, 3 Feb 2023 19:22:43 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Hillf Danton <hdanton@...a.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Cc:     linux-kernel@...r.kernel.org, mcgrof@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        syzkaller-bugs@...glegroups.com,
        syzbot <syzbot+6cd18e123583550cf469@...kaller.appspotmail.com>
Subject: Re: [syzbot] WARNING: locking bug in umh_complete

On 2023/01/27 10:41, Hillf Danton wrote:
> On Thu, 26 Jan 2023 14:27:51 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit:    71ab9c3e2253 net: fix UaF in netns ops registration error ..
>> git tree:       net
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10f86a56480000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=899d86a7610a0ea0
>> dashboard link: https://syzkaller.appspot.com/bug?extid=6cd18e123583550cf469
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/54c953096fdb/disk-71ab9c3e.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/644d72265810/vmlinux-71ab9c3e.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/16e994579ca5/bzImage-71ab9c3e.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+6cd18e123583550cf469@...kaller.appspotmail.com
>>
>> ------------[ cut here ]------------
>> DEBUG_LOCKS_WARN_ON(1)
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:231 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:220 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 check_wait_context kernel/locking/lockdep.c:4729 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 __lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
>> Modules linked in:
>> CPU: 0 PID: 46 Comm: kworker/u4:3 Not tainted 6.2.0-rc4-syzkaller-00191-g71ab9c3e2253 #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
>> Workqueue: events_unbound call_usermodehelper_exec_work
>> RIP: 0010:hlock_class kernel/locking/lockdep.c:231 [inline]
>> RIP: 0010:hlock_class kernel/locking/lockdep.c:220 [inline]
>> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4729 [inline]
>> RIP: 0010:__lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
>> Code: 08 84 d2 0f 85 56 41 00 00 8b 15 55 7b 0f 0d 85 d2 0f 85 d5 fd ff ff 48 c7 c6 c0 51 4c 8a 48 c7 c7 20 4b 4c 8a e8 92 e1 5b 08 <0f> 0b 31 ed e9 50 f0 ff ff 48 c7 c0 a8 2b 73 8e 48 ba 00 00 00 00
>> RSP: 0018:ffffc90000b77a30 EFLAGS: 00010082
>> RAX: 0000000000000000 RBX: ffff88801272a778 RCX: 0000000000000000
>> RDX: ffff888012729d40 RSI: ffffffff8166822c RDI: fffff5200016ef38
>> RBP: 0000000000001b2c R08: 0000000000000005 R09: 0000000000000000
>> R10: 0000000080000001 R11: 0000000000000001 R12: ffff88801272a7c8
>> R13: ffff888012729d40 R14: ffffc9000397fb28 R15: 0000000000040000
>> FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fa5a95e81d0 CR3: 00000000284d2000 CR4: 00000000003506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  <TASK>
>>  lock_acquire kernel/locking/lockdep.c:5668 [inline]
>>  lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>>  _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
>>  complete+0x1d/0x1f0 kernel/sched/completion.c:32
>>  umh_complete+0x32/0x90 kernel/umh.c:59
>>  call_usermodehelper_exec_sync kernel/umh.c:144 [inline]
>>  call_usermodehelper_exec_work+0x115/0x180 kernel/umh.c:167
>>  process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
>>  worker_thread+0x669/0x1090 kernel/workqueue.c:2436
>>  kthread+0x2e8/0x3a0 kernel/kthread.c:376
>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>>  </TASK>
> 
> This is an interesting case - given done initialized on stack, no garbage
> should have been detected by lockdep.
> 
> One explanation to the report is uaf on the waker side, and it can be
> tested with the diff below when a reproducer is available.
> 
> Hillf
> 
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -452,6 +452,12 @@ int call_usermodehelper_exec(struct subp
>  		/* umh_complete() will see NULL and free sub_info */
>  		if (xchg(&sub_info->complete, NULL))
>  			goto unlock;
> +		else {
> +			/* wait for umh_complete() to finish in a bid to avoid
> +			 * uaf because done is destructed
> +			 */
> +			wait_for_completion(&done);
> +		}
>  	}
>  
>  wait_done:
> --

Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
logic"), for that commit for unknown reason omits wait_for_completion(&done) call
when wait_for_completion_state(&done, state) returned -ERESTARTSYS.

Peter, is it safe to restore wait_for_completion(&done) call?