lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20220705110724.GB711@willie-the-truck> Date: Tue, 5 Jul 2022 12:07:25 +0100 From: Will Deacon <will@...nel.org> To: Kajetan Puchalski <kajetan.puchalski@....com> Cc: Florian Westphal <fw@...len.de>, Pablo Neira Ayuso <pablo@...filter.org>, Jozsef Kadlecsik <kadlec@...filter.org>, "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Mel Gorman <mgorman@...e.de>, lukasz.luba@....com, dietmar.eggemann@....com, mark.rutland@....com, mark.brown@....com, netfilter-devel@...r.kernel.org, coreteam@...filter.org, netdev@...r.kernel.org, stable@...r.kernel.org, regressions@...ts.linux.dev, linux-kernel@...r.kernel.org, peterz@...radead.org Subject: Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere Altra On Tue, Jul 05, 2022 at 11:57:49AM +0100, Will Deacon wrote: > On Tue, Jul 05, 2022 at 11:53:22AM +0100, Kajetan Puchalski wrote: > > On Mon, Jul 04, 2022 at 10:22:24AM +0100, Kajetan Puchalski wrote: > > > On Sat, Jul 02, 2022 at 10:56:51PM +0200, Florian Westphal wrote: > > > > > That would make sense, from further experiments I ran it somehow seems > > > > > to be related to the number of workers being spawned by stress-ng along > > > > > with the CPUs/cores involved. > > > > > > > > > > For instance, running the test with <=25 workers (--udp-flood 25 etc.) > > > > > results in the test running fine for at least 15 minutes. > > > > > > > > Ok. I will let it run for longer on the machines I have access to. > > > > > > > > In mean time, you could test attached patch, its simple s/refcount_/atomic_/ > > > > in nf_conntrack. > > > > > > > > If mainline (patch vs. HEAD 69cb6c6556ad89620547318439) crashes for you > > > > but works with attached patch someone who understands aarch64 memory ordering > > > > would have to look more closely at refcount_XXX functions to see where they > > > > might differ from atomic_ ones. > > > > > > I can confirm that the patch seems to solve the issue. > > > With it applied on top of the 5.19-rc5 tag the test runs fine for at > > > least 15 minutes which was not the case before so it looks like it is > > > that aarch64 memory ordering problem. > > > > I'm CCing some people who should be able to help with aarch64 memory > > ordering, maybe they could take a look. > > > > (re-sending due to a typo in CC, sorry for duplicate emails!) > > Sorry, but I have absolutely no context here. We have a handy document > describing the differences between atomic_t and refcount_t: > > Documentation/core-api/refcount-vs-atomic.rst > > What else do you need to know? Hmm, and I see a tonne of *_inc_not_zero() conversions in 719774377622 ("netfilter: conntrack: convert to refcount_t api") which mean that you no longer have ordering to subsequent reads in the absence of an address dependency. Will
Powered by blists - more mailing lists