lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Zo64cpho2cFQiOeE@LQ3V64L9R2>
Date: Wed, 10 Jul 2024 09:36:02 -0700
From: Joe Damato <jdamato@...tly.com>
To: me@...ehuey.com
Cc: acme@...nel.org, andrii.nakryiko@...il.com, bpf@...r.kernel.org,
	elver@...gle.com, jolsa@...nel.org, khuey@...ehuey.com,
	linux-kernel@...r.kernel.org, mingo@...nel.org, namhyung@...nel.org,
	peterz@...radead.org, robert@...llahan.org, yonghong.song@...ux.dev,
	mkarsten@...terloo.ca
Subject: possible bpf overflow/output bug introduced in 6.10?

Greetings:

While testing some unrelated networking code with Martin Karsten (cc'd on
this email) we discovered what appears to be some sort of overflow bug in
bpf.

git bisect suggests that commit f11f10bfa1ca ("perf/bpf: Call BPF handler
directly, not through overflow machinery") is the first commit where the
(I assume) buggy behavior appears.

Running the following on my machine as of the commit mentioned above:

  bpftrace -e 'tracepoint:napi:napi_poll { @[args->work] = count(); }'

while simultaneously transferring data to the target machine (in my case, I
scp'd a 100MiB file of zeros in a loop) results in very strange output
(snipped):

  @[11]: 5
  @[18]: 5
  @[-30590]: 6
  @[10]: 7
  @[14]: 9

It does not seem that the driver I am using on my test system (mlx5) would
ever return a negative value from its napi poll function and likewise for
the driver Martin is using (mlx4).

As such, I don't think it is possible for args->work to ever be a large
negative number, but perhaps I am misunderstanding something?

I would like to note that commit 14e40a9578b7 ("perf/bpf: Remove #ifdef
CONFIG_BPF_SYSCALL from struct perf_event members") does not exhibit this
behavior and the output seems reasonable on my test system. Martin confirms
the same for both commits on his test system, which uses different hardware
than mine.

Is this an expected side effect of this change? I would expect it is not
and that the output is a bug of some sort. My apologies in that I am not
particularly familiar with the bpf code and cannot suggest what the root
cause might be.

If it is not a bug, can anyone suggest what this output might mean or
how the script run above should be modified?

Thanks,
Joe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ