linux-kernel - Re: [BUG 5.3-rc5] rwsem: use after free on task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fec29712-0756-26ff-d6ac-68f52048a231@redhat.com>
Date:   Mon, 19 Aug 2019 11:16:05 -0400
From:   Waiman Long <longman@...hat.com>
To:     Dave Chinner <david@...morbit.com>, linux-kernel@...r.kernel.org
Cc:     peterz@...radead.org
Subject: Re: [BUG 5.3-rc5] rwsem: use after free on task_struct if task exits
 with rwsem held

On 8/19/19 2:47 AM, Dave Chinner wrote:
> Hi folks,
>
> In trying to track down an XFS regression, I stumbled across KASAN
> warnings about use-after-free behave in rwsems.
>
> Essentially, the XFS regression is triggering an ASSERT, which is
> BUG()ing a kernel thread that is holding the superblock s_umount
> rwsem in write mode (it is a mount problem).
>
> Once that thread has been killed (segv), the rwsem it held now has
> no valid owner - the owning task_struct has been freed. When the
> next attempt to access that superblock occurs (because it's visible
> in the superblock list), either by attmepting to do something
> through the block device (e.g. bdev_invalidate()) or by trying to
> mount the block device again, we get use-after-free warnings on
> the superblock s_umount rwsem.
>
> Need 5.3-rc5 w/ CONFIG_XFS_DEBUG=y (needed for the BUG to trigger),
> CONFIG_KASAN=y (to change the memory allocation alignment to cause
> IO failures that cause the conditions for the BUG to to trigger).
>
> Access through the bdev (I was only able to reproduce this one
> through /dev/pmem0) from a thrid party:
>
> # while [ 1 ]; do sudo xfs_io -fd -c "pwrite -S 0x0 -b 1m 0 8g" /dev/pmem0; mkfs.xfs -f -l size=2000m /dev/pmem0; mount -o logbsize=256k /dev/pmem0 /mnt/test; umount /dev/pmem0; done
>
> On the third or fourth loop, everything gets really, really slow
> when mounting - instaed of taking about 100ms to mount the filesystem,
> it takes a couple of minutes before it finally fails, triggering
> a BUG() that kills the mount process:
>
> [   59.316335] XFS (pmem0): Mounting V5 Filesystem
> [   59.322858] XFS (pmem0): Ending clean mount
> [   59.368816] XFS (pmem0): Unmounting Filesystem
> [   63.864465] XFS (pmem0): Mounting V5 Filesystem
> [   63.880840] XFS (pmem0): Ending clean mount
> [   63.928850] XFS (pmem0): Unmounting Filesystem
> [   68.433309] XFS (pmem0): Mounting V5 Filesystem
> [   68.436485] XFS (pmem0): totally zeroed log
> [  188.034629] XFS: Assertion failed: head_blk != tail_blk, file: fs/xfs/xfs_log_recover.c, line: 5236
> [  188.040585] ------------[ cut here ]------------
> [  188.041687] kernel BUG at fs/xfs/xfs_message.c:102!
> [  188.042870] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [  188.044129] CPU: 1 PID: 4740 Comm: mount Not tainted 5.3.0-rc5-dgc+ #1506
> .....
> <snip XFS stracktrace of problem I was trying to reproduce>
>
>
> I outlined both methods of causing this issue because they are two
> different use-after-free cases - one is in the read slowpath, the
> other is in the write slow path. 
>
> I know that processes should not exit while holding a rwsem, but
> bugs do happen.  I'd much prefer that leaked rwsems just hang and we
> do not add the potential for random memory corruption into these
> situations as well - a lock hang is much easier to debug than a
> memory corruption....

>From what I understand, a process acquires a write lock on a rwsem, then
got killed before releasing it. A pointer to the task structure will
remain in the rwsem structure. This pointer is primarily used for
optimistic spinning purpose on the on_cpu flag of the task structure.
Depending on the setting on the setting of the on_cpu flag, the spinning
task either continues spinning until its time quantum has expired or go
to sleep immediately. It is read-only access and no write to the task
structure will happen. No real harm should happen unless the memory of
the freed task structure become inaccessible. The bigger problem is that
the tasks that try to acquire the lock will hang waiting for the lock to
be freed. This use-after-free problem is the lesser of the 2 evils, IMHO.

The optimistic spinning mechanism is there for both rwsem and mutex. So
the same problem will happen if the killed task hold a mutex instead of
a rwsem. There is currently no code to detect if the task structure
pointed to by the owner field is legit or not.

-Longman