linux-kernel - Re: [PATCH] eventfd: Enlarge recursion limit to allow vhost to work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbecaad6-48fc-3c52-d764-747ea91dc3fa@windriver.com>
Date:   Fri, 3 Jul 2020 19:11:23 +0800
From:   He Zhe <zhe.he@...driver.com>
To:     Juri Lelli <juri.lelli@...hat.com>
Cc:     viro@...iv.linux.org.uk, axboe@...nel.dk,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] eventfd: Enlarge recursion limit to allow vhost to work



On 7/3/20 4:12 PM, Juri Lelli wrote:
> Hi,
>
> On 10/04/20 19:47, zhe.he@...driver.com wrote:
>> From: He Zhe <zhe.he@...driver.com>
>>
>> commit b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")
>> introduces a percpu counter that tracks the percpu recursion depth and
>> warn if it greater than zero, to avoid potential deadlock and stack
>> overflow.
>>
>> However sometimes different eventfds may be used in parallel. Specifically,
>> when heavy network load goes through kvm and vhost, working as below, it
>> would trigger the following call trace.
>>
>> -  100.00%
>>    - 66.51%
>>         ret_from_fork
>>         kthread
>>       - vhost_worker
>>          - 33.47% handle_tx_kick
>>               handle_tx
>>               handle_tx_copy
>>               vhost_tx_batch.isra.0
>>               vhost_add_used_and_signal_n
>>               eventfd_signal
>>          - 33.05% handle_rx_net
>>               handle_rx
>>               vhost_add_used_and_signal_n
>>               eventfd_signal
>>    - 33.49%
>>         ioctl
>>         entry_SYSCALL_64_after_hwframe
>>         do_syscall_64
>>         __x64_sys_ioctl
>>         ksys_ioctl
>>         do_vfs_ioctl
>>         kvm_vcpu_ioctl
>>         kvm_arch_vcpu_ioctl_run
>>         vmx_handle_exit
>>         handle_ept_misconfig
>>         kvm_io_bus_write
>>         __kvm_io_bus_write
>>         eventfd_signal
>>
>> 001: WARNING: CPU: 1 PID: 1503 at fs/eventfd.c:73 eventfd_signal+0x85/0xa0
>> ---- snip ----
>> 001: Call Trace:
>> 001:  vhost_signal+0x15e/0x1b0 [vhost]
>> 001:  vhost_add_used_and_signal_n+0x2b/0x40 [vhost]
>> 001:  handle_rx+0xb9/0x900 [vhost_net]
>> 001:  handle_rx_net+0x15/0x20 [vhost_net]
>> 001:  vhost_worker+0xbe/0x120 [vhost]
>> 001:  kthread+0x106/0x140
>> 001:  ? log_used.part.0+0x20/0x20 [vhost]
>> 001:  ? kthread_park+0x90/0x90
>> 001:  ret_from_fork+0x35/0x40
>> 001: ---[ end trace 0000000000000003 ]---
>>
>> This patch enlarges the limit to 1 which is the maximum recursion depth we
>> have found so far.
>>
>> Signed-off-by: He Zhe <zhe.he@...driver.com>
>> ---
> Not sure if this approch can fly, but I also encountered the same
> warning (which further caused hangs during VM install) and this change
> addresses that.
>
> I'd be interested in understanding what is the status of this problem/fix.

This is actually v2 of the patch and has not got any reply yet. Here is the v1. FYI.
https://lore.kernel.org/lkml/1586257192-58369-1-git-send-email-zhe.he@windriver.com/

> On a side note, by looking at the code, I noticed that (apart from
> samples) all callers don't actually check eventfd_signal() return value
> and I'm wondering why is that the case and if is it safe to do so.

Checking the return value right after sending the signal can tell us if the
event counter has just overflowed, that is, exceeding ULLONG_MAX. I guess the
authors of the callers listed in the commit log just don't worry about that,
since they add only one to a dedicated eventfd.

Zhe

>
> Thanks,
>
> Juri
>