lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcf5ee96-178d-061a-ccbe-9be925b1b708@amd.com>
Date:   Tue, 31 Jan 2023 13:55:35 +0530
From:   Ravi Bangoria <ravi.bangoria@....com>
To:     James Clark <james.clark@....com>,
        linux-perf-users@...r.kernel.org, peterz@...radead.org
Cc:     syzbot+697196bc0265049822bd@...kaller.appspotmail.com,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-kernel@...r.kernel.org,
        Thomas Richter <tmricht@...ux.ibm.com>,
        Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [PATCH 1/1] perf: Fix warning from concurrent read/write of
 perf_event_pmu_context

On 30-Jan-23 11:19 AM, Ravi Bangoria wrote:
> Hi James,
> 
> On 27-Jan-23 8:01 PM, James Clark wrote:
>> When running two Perf sessions, the following warning can appear:
>>
>>   WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
>>   Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bridge stp llc coresight_stm stm_core coresight_etm4x coresight_tmc coresight_replicator coresight_funnel coresight_tpiu coresight arm_spe_pmu ip_tables x_tables ipv6 xhci_pci xhci_pci_renesas r8169
>>   CPU: 1 PID: 2245 Comm: perf Not tainted 6.2.0-rc4+ #1
>>   pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>   pc : put_pmu_ctx+0x1f0/0x278
>>   lr : put_pmu_ctx+0x1b4/0x278
>>   sp : ffff80000dfcbc20
>>   x29: ffff80000dfcbca0 x28: ffff008004f00000 x27: ffff00800763a928
>>   x26: ffff00800763a928 x25: 00000000000000c0 x24: 0000000000000000
>>   x23: 00000000000a0003 x22: ffff00837df74088 x21: ffff80000dfcbd18
>>   x20: 0000000000000000 x19: ffff00800763a6c0 x18: 0000000000000000
>>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>   x14: 0000000000000000 x13: ffff80000dfc8000 x12: ffff80000dfcc000
>>   x11: be58ab6d2939e700 x10: be58ab6d2939e700 x9 : 0000000000000000
>>   x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000
>>   x5 : ffff00800093c9c0 x4 : 0000000000000000 x3 : ffff80000dfcbca0
>>   x2 : ffff008004f00000 x1 : ffff8000082403c4 x0 : 0000000000000000
>>   Call trace:
>>    put_pmu_ctx+0x1f0/0x278
>>    _free_event+0x2bc/0x3d0
>>    perf_event_release_kernel+0x444/0x4bc
>>    perf_release+0x20/0x30
>>    __fput+0xe4/0x25c
>>    ____fput+0x1c/0x28
>>    task_work_run+0xc4/0xe8
>>    do_notify_resume+0x10c/0x164
>>    el0_svc+0xb4/0xdc
>>    el0t_64_sync_handler+0x84/0xf0
>>    el0t_64_sync+0x190/0x194
>>
>> This is because there is no locking around the access of "if
>> (!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
>> put_pmu_ctx().
>>
>> The decrement of the reference count in put_pmu_ctx() also happens
>> outside of the spinlock, leading to the possibility of this order of
>> events, and the context being cleared in put_pmu_ctx(), after its
>> refcount is non zero:
>>
>>  CPU0                                   CPU1
>>  find_get_pmu_context()
>>    if (!epc->ctx) == false
>>                                         put_pmu_ctx()
>>                                         atomic_dec_and_test(&epc->refcount) == true
>>                                         epc->refcount == 0
>>      atomic_inc(&epc->refcount);
>>      epc->refcount == 1
>>                                         list_del_init(&epc->pmu_ctx_entry);
>> 	                                      epc->ctx = NULL;
>>
>> Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
>> is outside of the lock. If the perf_event_pmu_context is an embedded
>> one, even after clearing it, it won't be deleted and can be re-used. So
>> the warning can trigger. For this reason it also needs to be moved
>> inside the lock.
>>
>> The above warning is very quick to trigger on Arm by running these two
>> commands at the same time:
>>
>>   while true; do perf record -- ls; done
>>   while true; do perf record -- ls; done
> 
> These dose not trigger WARN_ON on my x86 machine, however, the C reproducer
> provided by syzbot[1] does trigger it.
> 
> [1]: https://syzkaller.appspot.com/text?tag=ReproC&x=17beacbc480000

Unless I'm missing some subtle scenario, the patch looks fine to me.

Reviewed-by: Ravi Bangoria <ravi.bangoria@....com>

Thanks,
Ravi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ