lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d65449b-5f8a-7a29-e879-9c27bd1d4537@oracle.com>
Date:   Fri, 16 Dec 2016 15:25:27 +0100
From:   Vegard Nossum <vegard.nossum@...cle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Rik van Riel <riel@...hat.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: crash during oom reaper

On 12/16/2016 03:00 PM, Michal Hocko wrote:
> On Fri 16-12-16 14:14:17, Vegard Nossum wrote:
> [...]
>> Out of memory: Kill process 1650 (trinity-main) score 90 or sacrifice child
>> Killed process 1724 (trinity-c14) total-vm:37280kB, anon-rss:236kB,
>> file-rss:112kB, shmem-rss:112kB
>> BUG: unable to handle kernel NULL pointer dereference at 00000000000001e8
>> IP: [<ffffffff8126b1c0>] copy_process.part.41+0x2150/0x5580
>> PGD c001067 PUD c000067
>> PMD 0
>> Oops: 0002 [#1] PREEMPT SMP KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> CPU: 28 PID: 1650 Comm: trinity-main Not tainted 4.9.0-rc6+ #317
>
> Hmm, so this was the oom victim initially but we have decided to kill
> its child 1724 instead.
>
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>> task: ffff88000f9bc440 task.stack: ffff88000c778000
>> RIP: 0010:[<ffffffff8126b1c0>]  [<ffffffff8126b1c0>]
>> copy_process.part.41+0x2150/0x5580
>
> Could you match this to the kernel source please?

kernel/fork.c:629 dup_mmap()

it's atomic_dec(&inode->i_writecount), it matches up with
file_inode(file) == NULL:

(gdb) p &((struct inode *)0)->i_writecount
$1 = (atomic_t *) 0x1e8 <irq_stack_union+488>

>> Killed process 1775 (trinity-c21) total-vm:37404kB, anon-rss:232kB,
>> file-rss:420kB, shmem-rss:116kB
>> oom_reaper: reaped process 1775 (trinity-c21), now anon-rss:0kB,
>> file-rss:0kB, shmem-rss:116kB
>> ==================================================================
>> BUG: KASAN: use-after-free in p9_client_read+0x8f0/0x960 at addr
>> ffff880010284d00
>> Read of size 8 by task trinity-main/1649
>> CPU: 3 PID: 1649 Comm: trinity-main Not tainted 4.9.0+ #318
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>>  ffff8800068a7770 ffffffff82012301 ffff88001100f600 ffff880010284d00
>>  ffff880010284d60 ffff880010284d00 ffff8800068a7798 ffffffff8165872c
>>  ffff8800068a7828 ffff880010284d00 ffff88001100f600 ffff8800068a7818
>> Call Trace:
>>  [<ffffffff82012301>] dump_stack+0x83/0xb2
>>  [<ffffffff8165872c>] kasan_object_err+0x1c/0x70
>>  [<ffffffff816589c5>] kasan_report_error+0x1f5/0x4e0
>>  [<ffffffff81657d92>] ? kasan_slab_alloc+0x12/0x20
>>  [<ffffffff82079357>] ? check_preemption_disabled+0x37/0x1e0
>>  [<ffffffff81658e4e>] __asan_report_load8_noabort+0x3e/0x40
>>  [<ffffffff82079300>] ? assoc_array_gc+0x1310/0x1330
>>  [<ffffffff83b84c30>] ? p9_client_read+0x8f0/0x960
>>  [<ffffffff83b84c30>] p9_client_read+0x8f0/0x960
>
> no idea how we would end up with use after here. Even if I unmapped the
> page then the read code should be able to cope with that. This smells
> like a p9 issue to me.

This is fid->clnt dereference at the top of p9_client_read().

Ah, yes, this is the one coming from a page fault:

p9_client_read
v9fs_fid_readpage
v9fs_vfs_readpage
handle_mm_fault
__do_page_fault

the bad fid pointer is filp->private_data.

Hm, so I guess the file itself was NOT freed prematurely (as otherwise
we'd probably have seen a KASAN report for the filp->private_data
dereference), but the ->private_data itself was.

Maybe the whole thing is fundamentally a 9p bug and the OOM killer just
happens to trigger it.


Vegard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ