lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACGkMEusLdoKA7pBWnKCLR5yEJ4JN-ivEJquEgKjh3pfKSta8A@mail.gmail.com>
Date: Tue, 26 Mar 2024 14:05:10 +0800
From: Jason Wang <jasowang@...hat.com>
To: Heng Qi <hengqi@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, virtualization@...ts.linux.dev, 
	"Michael S. Tsirkin" <mst@...hat.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>, 
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: Re: [PATCH 2/2] virtio-net: reduce the CPU consumption of dim worker

On Tue, Mar 26, 2024 at 1:57 PM Heng Qi <hengqi@...ux.alibaba.com> wrote:
>
>
>
> 在 2024/3/26 下午12:08, Jason Wang 写道:
> > On Tue, Mar 26, 2024 at 10:46 AM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> >>
> >>
> >> 在 2024/3/25 下午4:42, Jason Wang 写道:
> >>> On Mon, Mar 25, 2024 at 4:22 PM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> >>>>
> >>>> 在 2024/3/25 下午3:56, Jason Wang 写道:
> >>>>> On Mon, Mar 25, 2024 at 3:18 PM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> >>>>>> 在 2024/3/25 下午1:57, Jason Wang 写道:
> >>>>>>> On Mon, Mar 25, 2024 at 10:21 AM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> >>>>>>>> 在 2024/3/22 下午1:19, Jason Wang 写道:
> >>>>>>>>> On Thu, Mar 21, 2024 at 7:46 PM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> >>>>>>>>>> Currently, ctrlq processes commands in a synchronous manner,
> >>>>>>>>>> which increases the delay of dim commands when configuring
> >>>>>>>>>> multi-queue VMs, which in turn causes the CPU utilization to
> >>>>>>>>>> increase and interferes with the performance of dim.
> >>>>>>>>>>
> >>>>>>>>>> Therefore we asynchronously process ctlq's dim commands.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Heng Qi <hengqi@...ux.alibaba.com>
> >>>>>>>>> I may miss some previous discussions.
> >>>>>>>>>
> >>>>>>>>> But at least the changelog needs to explain why you don't use interrupt.
> >>>>>>>> Will add, but reply here first.
> >>>>>>>>
> >>>>>>>> When upgrading the driver's ctrlq to use interrupt, problems may occur
> >>>>>>>> with some existing devices.
> >>>>>>>> For example, when existing devices are replaced with new drivers, they
> >>>>>>>> may not work.
> >>>>>>>> Or, if the guest OS supported by the new device is replaced by an old
> >>>>>>>> downstream OS product, it will not be usable.
> >>>>>>>>
> >>>>>>>> Although, ctrlq has the same capabilities as IOq in the virtio spec,
> >>>>>>>> this does have historical baggage.
> >>>>>>> I don't think the upstream Linux drivers need to workaround buggy
> >>>>>>> devices. Or it is a good excuse to block configure interrupts.
> >>>>>> Of course I agree. Our DPU devices support ctrlq irq natively, as long
> >>>>>> as the guest os opens irq to ctrlq.
> >>>>>>
> >>>>>> If other products have no problem with this, I would prefer to use irq
> >>>>>> to solve this problem, which is the most essential solution.
> >>>>> Let's do that.
> >>>> Ok, will do.
> >>>>
> >>>> Do you have the link to the patch where you previously modified the
> >>>> control queue for interrupt notifications.
> >>>> I think a new patch could be made on top of it, but I can't seem to find it.
> >>> Something like this?
> >> YES. Thanks Jason.
> >>
> >>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> >>>
> >>> Note that
> >>>
> >>> 1) some patch has been merged
> >>> 2) we probably need to drop the timeout logic as it's another topic
> >>> 3) need to address other comments
> >> I did a quick read of your patch sets from the previous 5 version:
> >> [1]
> >> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> >> [2] https://lore.kernel.org/all/20221226074908.8154-1-jasowang@redhat.com/
> >> [3] https://lore.kernel.org/all/20230413064027.13267-1-jasowang@redhat.com/
> >> [4] https://lore.kernel.org/all/20230524081842.3060-1-jasowang@redhat.com/
> >> [5] https://lore.kernel.org/all/20230720083839.481487-1-jasowang@redhat.com/
> >>
> >> Regarding adding the interrupt to ctrlq, there are a few points where
> >> there is no agreement,
> >> which I summarize below.
> >>
> >> 1. Require additional interrupt vector resource
> >> https://lore.kernel.org/all/20230516165043-mutt-send-email-mst@kernel.org/
> > I don't think one more vector is a big problem. Multiqueue will
> > require much more than this.
> >
> > Even if it is, we can try to share an interrupt as Michael suggests.
> >
> > Let's start from something that is simple, just one more vector.
>
> OK, that puts my concerns to rest.
>
> >
> >> 2. Adding the interrupt for ctrlq may break some devices
> >> https://lore.kernel.org/all/f9e75ce5-e6df-d1be-201b-7d0f18c1b6e7@redhat.com/
> > These devices need to be fixed. It's hard to imagine the evolution of
> > virtio-net is blocked by buggy devices.
>
> Agree.
>
> >
> >> 3. RTNL breaks surprise removal
> >> https://lore.kernel.org/all/20230720170001-mutt-send-email-mst@kernel.org/
> > The comment is for indefinite waiting for ctrl vq which turns out to
> > be another issue.
> >
> > For the removal, we just need to do the wakeup then everything is fine.
>
> Then I will make a patch set based on irq and without timeout.

Ok.

>
> >
> >> Regarding the above, there seems to be no conclusion yet.
> >> If these problems still exist, I think this patch is good enough and we
> >> can merge it first.
> > I don't think so, poll turns out to be problematic for a lot of cases.
> >
> >> For the third point, it seems to be being solved by Daniel now [6], but
> >> spink lock is used,
> >> which I think conflicts with the way of adding interrupts to ctrlq.
> >>
> >> [6] https://lore.kernel.org/all/20240325214912.323749-1-danielj@nvidia.com/
> > I don't see how it conflicts with this.
>
> I'll just make changes on top of it. Can I?

It should be fine.

Thanks

>
> Thanks,
> Heng
>
> >
> > Thanks
> >
> >>
> >> Thanks,
> >> Heng
> >>
> >>> THanks
> >>>
> >>>
> >>>> Thanks,
> >>>> Heng
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>>> And I remember you told us your device doesn't have such an issue.
> >>>>>> YES.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Heng
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Heng
> >>>>>>>>
> >>>>>>>>> Thanks
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ