lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 8 Apr 2024 08:35:30 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jason Wang <jasowang@...hat.com>, Igor Raits <igor@...ddata.com>, 
	Stefan Hajnoczi <stefanha@...hat.com>, kvm@...r.kernel.org, virtualization@...ts.linux.dev, 
	netdev@...r.kernel.org, Stefano Garzarella <sgarzare@...hat.com>, 
	"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: REGRESSION: RIP: 0010:skb_release_data+0xb8/0x1e0 in vhost/tun

čt 4. 4. 2024 v 20:17 odesílatel Jaroslav Pulchart
<jaroslav.pulchart@...ddata.com> napsal:
>
> čt 4. 4. 2024 v 15:37 odesílatel Jakub Kicinski <kuba@...nel.org> napsal:
> >
> > On Thu, 4 Apr 2024 07:42:45 +0200 Jaroslav Pulchart wrote:
> > > We do not have much progress
> >
> > Random thought - do you have KFENCE enabled?
> > It's sufficiently low overhead to run in production and maybe it could
> > help catch the bug? You also hit some inexplicable bug in the Intel
> > driver, IIRC, there may be something odd going on.. (it's not all
> > happening on a single machine, right?)
>
> We have KFENCE enabled.
>
> Issue was observed at multiple servers. It is not a problem to reproduce it
> everywhere where we deploy Loki service. The trigger is: I click
> once/twice "run query" (LogQL) button by Grafana UI. the Loki is
> starting to load data from the minio cluster at a speed of ~2GB/s and
> almost immediately it crashes.
>
> The Intel ICE driver is in my suspicion as well, it will not be for
> the first time when we are hitting some bugs there. I will try one
> testing server where we have different NIC vendor later.

I run the setup on a server with a different network card than E810, I
used BCM57414 NetXtreme-E + driver bnxt_en. The issue is not
reproducible there. So it looks to be connected with Intel's ice
driver for E810 network card and introduced in 6.3.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ