[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230717130831.0f18381a.alex.williamson@redhat.com>
Date: Mon, 17 Jul 2023 13:08:31 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Grzegorz Jaszczyk <jaz@...ihalf.com>
Cc: Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org,
linux-aio@...ck.org, linux-usb@...r.kernel.org, Matthew Rosato
<mjrosato@...ux.ibm.com>, Paul Durrant <paul@....org>, Tom Rix
<trix@...hat.com>, Jason Wang <jasowang@...hat.com>,
dri-devel@...ts.freedesktop.org, Michal Hocko <mhocko@...nel.org>,
linux-mm@...ck.org, Kirti Wankhede <kwankhede@...dia.com>, Paolo Bonzini
<pbonzini@...hat.com>, Jens Axboe <axboe@...nel.dk>, Vineeth Vijayan
<vneethv@...ux.ibm.com>, Diana Craciun <diana.craciun@....nxp.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>, Xuan Zhuo
<xuanzhuo@...ux.alibaba.com>, Shakeel Butt <shakeelb@...gle.com>, Vasily
Gorbik <gor@...ux.ibm.com>, Leon Romanovsky <leon@...nel.org>, Harald
Freudenberger <freude@...ux.ibm.com>, Fei Li <fei1.li@...el.com>,
x86@...nel.org, Roman Gushchin <roman.gushchin@...ux.dev>, Halil Pasic
<pasic@...ux.ibm.com>, Jason Gunthorpe <jgg@...pe.ca>, Ingo Molnar
<mingo@...hat.com>, intel-gfx@...ts.freedesktop.org, Christian Borntraeger
<borntraeger@...ux.ibm.com>, linux-fpga@...r.kernel.org, Zhi Wang
<zhi.a.wang@...el.com>, Wu Hao <hao.wu@...el.com>, Jason Herne
<jjherne@...ux.ibm.com>, Eric Farman <farman@...ux.ibm.com>, Dave Hansen
<dave.hansen@...ux.intel.com>, Andrew Donnellan <ajd@...ux.ibm.com>, Arnd
Bergmann <arnd@...db.de>, linux-s390@...r.kernel.org, Heiko Carstens
<hca@...ux.ibm.com>, Johannes Weiner <hannes@...xchg.org>,
linuxppc-dev@...ts.ozlabs.org, Eric Auger <eric.auger@...hat.com>, Borislav
Petkov <bp@...en8.de>, kvm@...r.kernel.org, Rodrigo Vivi
<rodrigo.vivi@...el.com>, cgroups@...r.kernel.org, Thomas Gleixner
<tglx@...utronix.de>, virtualization@...ts.linux-foundation.org,
intel-gvt-dev@...ts.freedesktop.org, io-uring@...r.kernel.org,
netdev@...r.kernel.org, Tony Krowiak <akrowiak@...ux.ibm.com>, Tvrtko
Ursulin <tvrtko.ursulin@...ux.intel.com>, Pavel Begunkov
<asml.silence@...il.com>, Sean Christopherson <seanjc@...gle.com>, Oded
Gabbay <ogabbay@...nel.org>, Muchun Song <muchun.song@...ux.dev>, Peter
Oberparleiter <oberpar@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-rdma@...r.kernel.org, Benjamin LaHaise <bcrl@...ck.org>, "Michael S.
Tsirkin" <mst@...hat.com>, Sven Schnelle <svens@...ux.ibm.com>, Greg
Kroah-Hartman <gregkh@...uxfoundation.org>, Frederic Barrat
<fbarrat@...ux.ibm.com>, Moritz Fischer <mdf@...nel.org>, Vitaly Kuznetsov
<vkuznets@...hat.com>, David Woodhouse <dwmw2@...radead.org>, Xu Yilun
<yilun.xu@...el.com>, Dominik Behr <dbehr@...omium.org>, Marcin Wojtas
<mw@...ihalf.com>
Subject: Re: [PATCH 0/2] eventfd: simplify signal helpers
On Mon, 17 Jul 2023 10:29:34 +0200
Grzegorz Jaszczyk <jaz@...ihalf.com> wrote:
> pt., 14 lip 2023 o 09:05 Christian Brauner <brauner@...nel.org> napisaĆ(a):
> >
> > On Thu, Jul 13, 2023 at 11:10:54AM -0600, Alex Williamson wrote:
> > > On Thu, 13 Jul 2023 12:05:36 +0200
> > > Christian Brauner <brauner@...nel.org> wrote:
> > >
> > > > Hey everyone,
> > > >
> > > > This simplifies the eventfd_signal() and eventfd_signal_mask() helpers
> > > > by removing the count argument which is effectively unused.
> > >
> > > We have a patch under review which does in fact make use of the
> > > signaling value:
> > >
> > > https://lore.kernel.org/all/20230630155936.3015595-1-jaz@semihalf.com/
> >
> > Huh, thanks for the link.
> >
> > Quoting from
> > https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-jaz@semihalf.com/#25266856
> >
> > > Reading an eventfd returns an 8-byte value, we generally only use it
> > > as a counter, but it's been discussed previously and IIRC, it's possible
> > > to use that value as a notification value.
> >
> > So the goal is to pipe a specific value through eventfd? But it is
> > explicitly a counter. The whole thing is written around a counter and
> > each write and signal adds to the counter.
> >
> > The consequences are pretty well described in the cover letter of
> > v6 https://lore.kernel.org/all/20230630155936.3015595-1-jaz@semihalf.com/
> >
> > > Since the eventfd counter is used as ACPI notification value
> > > placeholder, the eventfd signaling needs to be serialized in order to
> > > not end up with notification values being coalesced. Therefore ACPI
> > > notification values are buffered and signalized one by one, when the
> > > previous notification value has been consumed.
> >
> > But isn't this a good indication that you really don't want an eventfd
> > but something that's explicitly designed to associate specific data with
> > a notification? Using eventfd in that manner requires serialization,
> > buffering, and enforces ordering.
What would that mechanism be? We've been iterating on getting the
serialization and buffering correct, but I don't know of another means
that combines the notification with a value, so we'd likely end up with
an eventfd only for notification and a separate ring buffer for
notification values.
As this series demonstrates, the current in-kernel users only increment
the counter and most userspace likely discards the counter value, which
makes the counter largely a waste. While perhaps unconventional,
there's no requirement that the counter may only be incremented by one,
nor any restriction that I see in how userspace must interpret the
counter value.
As I understand the ACPI notification proposal that Grzegorz links
below, a notification with an interpreted value allows for a more
direct userspace implementation when dealing with a series of discrete
notification with value events. Thanks,
Alex
> > I have no skin in the game aside from having to drop this conversion
> > which I'm fine to do if there are actually users for this btu really,
> > that looks a lot like abusing an api that really wasn't designed for
> > this.
>
> https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-jaz@semihalf.com/
> was posted at the beginig of March and one of the main things we've
> discussed was the mechanism for propagating acpi notification value.
> We've endup with eventfd as the best mechanism and have actually been
> using it from v2. I really do not want to waste this effort, I think
> we are quite advanced with v6 now. Additionally we didn't actually
> modify any part of eventfd support that was in place, we only used it
> in a specific (and discussed beforehand) way.
Powered by blists - more mailing lists