linux-kernel - Re: [PATCH] kernel: fix data race in put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Zrhz5ZN2j5v+JcSOq=1ZHX4YHYwNx1eVc+vPBZDRFSHQ@mail.gmail.com>
Date:	Thu, 17 Sep 2015 20:38:15 +0200
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	ebiederm@...ssion.com, Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>, mhocko@...e.cz,
	LKML <linux-kernel@...r.kernel.org>, ktsan@...glegroups.com,
	Kostya Serebryany <kcc@...gle.com>,
	Andrey Konovalov <andreyknvl@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Hans Boehm <hboehm@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] kernel: fix data race in put_pid

On Thu, Sep 17, 2015 at 8:09 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> On 09/17, Dmitry Vyukov wrote:
>>
>> I can update the patch description, but let me explain it here first.
>
> Yes thanks.
>
>> Here is the essence of what happens:
>
> Aha, so you really meant that 2 put_pid's can race with each other,
>
>> // thread 1
>> 1: pid->foo = 1; // foo is the first word of pid object
>> // then it does put_pid
>> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> returns false so the function returns
>>
>> // thread 2
>> // executes put_pid
>> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> // then kmem_cache_free does:
>> 5: head->freelist = (void*)pid;
>>
>> This can be executed as:
>>
>> 4: *(void**)pid = head->freelist;
>> 1: pid->foo = 1; // foo is the first word of pid object
>> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> returns false so the function returns
>> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> 5: head->freelist = (void*)pid;
>
> Unless I am totally confused, everything is simpler. We can forget
> about the hoisting, freelist, etc.
>
> Thread 2 can see the result of atomic_dec_and_test(), but not the
> result of "pid->foo = 1". In this case in can free the object which
> can be re-allocated _before_ STORE(pid->foo) completes. Of course,
> this would be really bad.


Yes, that's what I mean.
A missed memory barrier can break in lots of magical, complex ways. So
I generally prefer to not think about concrete scenarios. Passing a
non-acquired object to kfree is a data race.


> I need to recheck, but afaics this is not possible. This optimization
> is fine, but probably needs a comment. We rely on delayed_put_pid()
> called by RCU. And note that nobody can write to this pid after it
> is removed from the rcu-protected list.
>
> So I think this is false alarm, but I'll try to recheck tomorrow, it
> is too late for me today.

Well, if that would be true, then put_pid would not contain any atomic
operations.
Here is the report from KTSAN that I observe:

    ThreadSanitizer: data-race in kt_memblock_free

    Write of size 8 by thread T107 (K630):
     [<ffffffff812499ef>] kt_memblock_free+0xdf/0x150 mm/ktsan/memblock.c:90
     [<ffffffff81249704>] ktsan_memblock_free+0xc4/0xf0
mm/ktsan/ktsan.c:251 (discriminator 6)
     [<     inlined    >] kmem_cache_free+0x99/0x610 __cache_free mm/slab.c:3383
     [<ffffffff81239149>] kmem_cache_free+0x99/0x610 mm/slab.c:3561
     [<ffffffff810b4095>] put_pid+0x85/0xa0 kernel/pid.c:247
     [<ffffffff810b40ce>] delayed_put_pid+0x1e/0x30 kernel/pid.c:256
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
__rcu_reclaim kernel/rcu/rcu.h:118
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
rcu_do_batch kernel/rcu/tree.c:2669
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
invoke_rcu_callbacks kernel/rcu/tree.c:2937
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
__rcu_process_callbacks kernel/rcu/tree.c:2904
     [<ffffffff811044d0>] rcu_process_callbacks+0x410/0xa70
kernel/rcu/tree.c:2921
     [<ffffffff8108f18d>] __do_softirq+0xad/0x2d0 kernel/softirq.c:273
     [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
     [<ffffffff8108f528>] irq_exit+0x98/0xa0 kernel/softirq.c:391
     [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80
exiting_irq ./arch/x86/include/asm/apic.h:655
     [<ffffffff8105cd33>] smp_apic_timer_interrupt+0x63/0x80
arch/x86/kernel/apic/apic.c:915
     [<ffffffff81e9661b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
     [<     inlined    >] complete+0x41/0x50 spin_unlock_irqrestore
include/linux/spinlock.h:372
     [<ffffffff810d90b1>] complete+0x41/0x50 kernel/sched/completion.c:36
     [<ffffffff810b7f11>] kthread+0x131/0x180 kernel/kthread.c:200
     [<ffffffff81e95bdf>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
    DBG: cpu = ffff88063fc1fe68
    DBG: cpu id = 0

    Previous read of size 8 by thread T28 (K25):
     [<ffffffff810b4043>] put_pid+0x33/0xa0 kernel/pid.c:244
     [<ffffffff810874a8>] _do_fork+0x1a8/0x550 kernel/fork.c:1746
     [<ffffffff8108788c>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
     [<ffffffff810a7e1c>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
     [<ffffffff810aebed>] process_one_work+0x2ad/0x6f0 kernel/workqueue.c:2036
     [<ffffffff810af769>] worker_thread+0xb9/0x730 kernel/workqueue.c:2170
     [<ffffffff810b7f41>] kthread+0x161/0x180 kernel/kthread.c:207
     [<ffffffff81e95bdf>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526


Once put_pid indeed comes from rcu_process_callbacks, but another
comes from _do_fork and it is not in an rcu read critical section.


-- 
Dmitry Vyukov, Software Engineer, dvyukov@...gle.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/