linux-kernel - Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190531165144.GE2606@hirez.programming.kicks-ass.net>
Date:   Fri, 31 May 2019 18:51:44 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Roman Penyaev <rpenyaev@...e.de>
Cc:     azat@...event.org, akpm@...ux-foundation.org,
        viro@...iv.linux.org.uk, torvalds@...ux-foundation.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing
 events to uring

On Fri, May 31, 2019 at 04:21:30PM +0200, Roman Penyaev wrote:

> The ep_add_event_to_uring() is lockless, thus I can't increase tail after,
> I need to reserve the index slot, where to write to.  I can use shadow tail,
> which is not seen by userspace, but I have to guarantee that tail is updated
> with shadow tail *after* all callers of ep_add_event_to_uring() are left.
> That is possible, please see the code below, but it adds more complexity:
> 
> (code was tested on user side, thus has c11 atomics)
> 
> static inline void add_event__kernel(struct ring *ring, unsigned bit)
> {
>         unsigned i, cntr, commit_cntr, *item_idx, tail, old;
> 
>         i = __atomic_fetch_add(&ring->cntr, 1, __ATOMIC_ACQUIRE);
>         item_idx = &ring->user_itemsindex[i % ring->nr];
> 
>         /* Update data */
>         *item_idx = bit;
> 
>         commit_cntr = __atomic_add_fetch(&ring->commit_cntr, 1,
> __ATOMIC_RELEASE);
> 
>         tail = ring->user_header->tail;
>         rmb();
>         do {
>                 cntr = ring->cntr;
>                 if (cntr != commit_cntr)
>                         /* Someone else will advance tail */
>                         break;
> 
>                 old = tail;
> 
>         } while ((tail =
> __sync_val_compare_and_swap(&ring->user_header->tail, old, cntr)) != old);
> }

Yes, I'm well aware of that particular problem (see
kernel/events/ring_buffer.c:perf_output_put_handle for instance). But
like you show, it can be done. It also makes the thing wait-free, as
opposed to merely lockless.