[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPpodde7Bi4ewzPqPC0ZNAMdy=3LYgzUHsADZKFgGniUCRdrRg@mail.gmail.com>
Date: Sat, 7 Sep 2024 12:16:24 +0900
From: Takero Funaki <flintglass@...il.com>
To: "Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, Jason Wang <jasowang@...hat.com>,
Eugenio Pérez <eperezma@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, virtualization@...ts.linux.dev,
Si-Wei Liu <si-wei.liu@...cle.com>, Darren Kenny <darren.kenny@...cle.com>
Subject: Re: [PATCH net] virtio-net: fix overflow inside virtnet_rq_alloc
2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@...hat.com>:
>
> On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
> > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > > >
> > > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > > >
> > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > > to VM):
> > > > > > > >
> > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > > the first buffer of the frag is affected.
> > > > > > > >
> > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@...cle.com>
> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> > > > > > >
> > > > > > >
> > > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > > work Xuan Zhuo just did.
> > > > > >
> > > > > > I think this patch can fix it and I tested it.
> > > > > > But Darren said this patch did not work.
> > > > > > I need more info about the crash that Darren encountered.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > So what are we doing? Revert the whole pile for now?
> > > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > > for this release.
> > > >
> > > > @Jason Could you review this?
> > > >
> > > > I think this problem is clear, though I do not know why it did not work
> > > > for Darren.
> > > >
> > > > Thanks.
> > > >
> > >
> > > No regressions is a hard rule. If we can't figure out the regression
> > > now, we should revert and you can try again for the next release.
> >
> > I see. I think I fixed it.
> >
> > Hope Darren can reply before you post the revert patches.
> >
> > Thanks.
> >
>
> It's very rushed anyway. I posted the reverts, but as RFC for now.
> You should post a debugging patch for Darren to help you figure
> out what is going on.
>
>
Hello,
My issue [1], which bisected to the commit f9dac92ba908, was resolved
after applying the patch on v6.11-rc6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=219154
In my case, random crashes occur when receiving large data under heavy
memory/IO load. Although the crash details differ, the memory
corruption during data transfers is consistent.
If Darren is unable to confirm the fix, would it be possible to
consider merging this patch to close [1] instead?
Thanks.
Powered by blists - more mailing lists