[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK8fFZ74tj=u5HKfpFue1ejy_9V3xWdf-ekC0gLt8BmJs7Y5ZQ@mail.gmail.com>
Date: Mon, 8 Apr 2024 08:35:30 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jason Wang <jasowang@...hat.com>, Igor Raits <igor@...ddata.com>,
Stefan Hajnoczi <stefanha@...hat.com>, kvm@...r.kernel.org, virtualization@...ts.linux.dev,
netdev@...r.kernel.org, Stefano Garzarella <sgarzare@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: REGRESSION: RIP: 0010:skb_release_data+0xb8/0x1e0 in vhost/tun
čt 4. 4. 2024 v 20:17 odesílatel Jaroslav Pulchart
<jaroslav.pulchart@...ddata.com> napsal:
>
> čt 4. 4. 2024 v 15:37 odesílatel Jakub Kicinski <kuba@...nel.org> napsal:
> >
> > On Thu, 4 Apr 2024 07:42:45 +0200 Jaroslav Pulchart wrote:
> > > We do not have much progress
> >
> > Random thought - do you have KFENCE enabled?
> > It's sufficiently low overhead to run in production and maybe it could
> > help catch the bug? You also hit some inexplicable bug in the Intel
> > driver, IIRC, there may be something odd going on.. (it's not all
> > happening on a single machine, right?)
>
> We have KFENCE enabled.
>
> Issue was observed at multiple servers. It is not a problem to reproduce it
> everywhere where we deploy Loki service. The trigger is: I click
> once/twice "run query" (LogQL) button by Grafana UI. the Loki is
> starting to load data from the minio cluster at a speed of ~2GB/s and
> almost immediately it crashes.
>
> The Intel ICE driver is in my suspicion as well, it will not be for
> the first time when we are hitting some bugs there. I will try one
> testing server where we have different NIC vendor later.
I run the setup on a server with a different network card than E810, I
used BCM57414 NetXtreme-E + driver bnxt_en. The issue is not
reproducible there. So it looks to be connected with Intel's ice
driver for E810 network card and introduced in 6.3.
Powered by blists - more mailing lists