[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230717130831.0f18381a.alex.williamson@redhat.com>
Date: Mon, 17 Jul 2023 13:08:31 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Grzegorz Jaszczyk <jaz@...ihalf.com>
Cc: Christian Brauner <brauner@...nel.org>,
linux-fsdevel@...r.kernel.org, linux-aio@...ck.org,
linux-usb@...r.kernel.org, Matthew Rosato <mjrosato@...ux.ibm.com>,
Paul Durrant <paul@....org>, Tom Rix <trix@...hat.com>,
Jason Wang <jasowang@...hat.com>,
dri-devel@...ts.freedesktop.org, Michal Hocko <mhocko@...nel.org>,
linux-mm@...ck.org, Kirti Wankhede <kwankhede@...dia.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Jens Axboe <axboe@...nel.dk>,
Vineeth Vijayan <vneethv@...ux.ibm.com>,
Diana Craciun <diana.craciun@....nxp.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Shakeel Butt <shakeelb@...gle.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Leon Romanovsky <leon@...nel.org>,
Harald Freudenberger <freude@...ux.ibm.com>,
Fei Li <fei1.li@...el.com>, x86@...nel.org,
Roman Gushchin <roman.gushchin@...ux.dev>,
Halil Pasic <pasic@...ux.ibm.com>,
Jason Gunthorpe <jgg@...pe.ca>, Ingo Molnar <mingo@...hat.com>,
intel-gfx@...ts.freedesktop.org,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
linux-fpga@...r.kernel.org, Zhi Wang <zhi.a.wang@...el.com>,
Wu Hao <hao.wu@...el.com>, Jason Herne <jjherne@...ux.ibm.com>,
Eric Farman <farman@...ux.ibm.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andrew Donnellan <ajd@...ux.ibm.com>,
Arnd Bergmann <arnd@...db.de>, linux-s390@...r.kernel.org,
Heiko Carstens <hca@...ux.ibm.com>,
Johannes Weiner <hannes@...xchg.org>,
linuxppc-dev@...ts.ozlabs.org, Eric Auger <eric.auger@...hat.com>,
Borislav Petkov <bp@...en8.de>, kvm@...r.kernel.org,
Rodrigo Vivi <rodrigo.vivi@...el.com>, cgroups@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
virtualization@...ts.linux-foundation.org,
intel-gvt-dev@...ts.freedesktop.org, io-uring@...r.kernel.org,
netdev@...r.kernel.org, Tony Krowiak <akrowiak@...ux.ibm.com>,
Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
Pavel Begunkov <asml.silence@...il.com>,
Sean Christopherson <seanjc@...gle.com>,
Oded Gabbay <ogabbay@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Peter Oberparleiter <oberpar@...ux.ibm.com>,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
Benjamin LaHaise <bcrl@...ck.org>,
"Michael S. Tsirkin" <mst@...hat.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Frederic Barrat <fbarrat@...ux.ibm.com>,
Moritz Fischer <mdf@...nel.org>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
David Woodhouse <dwmw2@...radead.org>,
Xu Yilun <yilun.xu@...el.com>,
Dominik Behr <dbehr@...omium.org>,
Marcin Wojtas <mw@...ihalf.com>
Subject: Re: [PATCH 0/2] eventfd: simplify signal helpers
On Mon, 17 Jul 2023 10:29:34 +0200
Grzegorz Jaszczyk <jaz@...ihalf.com> wrote:
> pt., 14 lip 2023 o 09:05 Christian Brauner <brauner@...nel.org> napisaĆ(a):
> >
> > On Thu, Jul 13, 2023 at 11:10:54AM -0600, Alex Williamson wrote:
> > > On Thu, 13 Jul 2023 12:05:36 +0200
> > > Christian Brauner <brauner@...nel.org> wrote:
> > >
> > > > Hey everyone,
> > > >
> > > > This simplifies the eventfd_signal() and eventfd_signal_mask() helpers
> > > > by removing the count argument which is effectively unused.
> > >
> > > We have a patch under review which does in fact make use of the
> > > signaling value:
> > >
> > > https://lore.kernel.org/all/20230630155936.3015595-1-jaz@semihalf.com/
> >
> > Huh, thanks for the link.
> >
> > Quoting from
> > https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-jaz@semihalf.com/#25266856
> >
> > > Reading an eventfd returns an 8-byte value, we generally only use it
> > > as a counter, but it's been discussed previously and IIRC, it's possible
> > > to use that value as a notification value.
> >
> > So the goal is to pipe a specific value through eventfd? But it is
> > explicitly a counter. The whole thing is written around a counter and
> > each write and signal adds to the counter.
> >
> > The consequences are pretty well described in the cover letter of
> > v6 https://lore.kernel.org/all/20230630155936.3015595-1-jaz@semihalf.com/
> >
> > > Since the eventfd counter is used as ACPI notification value
> > > placeholder, the eventfd signaling needs to be serialized in order to
> > > not end up with notification values being coalesced. Therefore ACPI
> > > notification values are buffered and signalized one by one, when the
> > > previous notification value has been consumed.
> >
> > But isn't this a good indication that you really don't want an eventfd
> > but something that's explicitly designed to associate specific data with
> > a notification? Using eventfd in that manner requires serialization,
> > buffering, and enforces ordering.
What would that mechanism be? We've been iterating on getting the
serialization and buffering correct, but I don't know of another means
that combines the notification with a value, so we'd likely end up with
an eventfd only for notification and a separate ring buffer for
notification values.
As this series demonstrates, the current in-kernel users only increment
the counter and most userspace likely discards the counter value, which
makes the counter largely a waste. While perhaps unconventional,
there's no requirement that the counter may only be incremented by one,
nor any restriction that I see in how userspace must interpret the
counter value.
As I understand the ACPI notification proposal that Grzegorz links
below, a notification with an interpreted value allows for a more
direct userspace implementation when dealing with a series of discrete
notification with value events. Thanks,
Alex
> > I have no skin in the game aside from having to drop this conversion
> > which I'm fine to do if there are actually users for this btu really,
> > that looks a lot like abusing an api that really wasn't designed for
> > this.
>
> https://patchwork.kernel.org/project/kvm/patch/20230307220553.631069-1-jaz@semihalf.com/
> was posted at the beginig of March and one of the main things we've
> discussed was the mechanism for propagating acpi notification value.
> We've endup with eventfd as the best mechanism and have actually been
> using it from v2. I really do not want to waste this effort, I think
> we are quite advanced with v6 now. Additionally we didn't actually
> modify any part of eventfd support that was in place, we only used it
> in a specific (and discussed beforehand) way.
Powered by blists - more mailing lists