lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250301064836.3285906-1-leo.lilong@huawei.com>
Date: Sat, 1 Mar 2025 14:48:34 +0800
From: Long Li <leo.lilong@...wei.com>
To: <chuck.lever@...cle.com>, <jlayton@...nel.org>, <neilb@...e.de>,
	<okorniev@...hat.com>, <Dai.Ngo@...cle.com>, <tom@...pey.com>,
	<trondmy@...nel.org>, <anna@...nel.org>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<horms@...nel.org>
CC: <linux-nfs@...r.kernel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <yi.zhang@...wei.com>,
	<leo.lilong@...wei.com>, <yangerkun@...wei.com>, <lonuxli.64@...il.com>
Subject: [PATCH 0/2] sunrpc: Fix issues with cache_detail nextcheck updates

During memory fault injection testing with nfsd restart, I encountered an
issue where NFS client threads would hang for around 1800 seconds. Analysis
showed that nfsd threads were blocked for approximately 1800 seconds with
the following scenario:

  PID: 3941444  TASK: ffff0000cf170040  CPU: 0    COMMAND: "nfsd"
   #0 [ffff80008d387120] __switch_to at ffffc4ef3c7a6af0
   #1 [ffff80008d387170] __schedule at ffffc4ef3c7a73a4
   #2 [ffff80008d3872c0] schedule at ffffc4ef3c7a8074
   #3 [ffff80008d387300] schedule_timeout at ffffc4ef3c7b7b60
   #4 [ffff80008d387470] wait_for_common at ffffc4ef3c7a944c
   #5 [ffff80008d387560] wait_for_completion_interruptible_timeout at ffffc4ef3c7a9630
   #6 [ffff80008d387570] cache_wait_req at ffffc4ef3c6804dc
   #7 [ffff80008d3876f0] cache_check at ffffc4ef3c680740
   #8 [ffff80008d3877d0] exp_find_key at ffffc4ef3b6e293c
   #9 [ffff80008d387910] exp_find at ffffc4ef3b6e2ccc
  #10 [ffff80008d387980] rqst_exp_find at ffffc4ef3b6e445c
  #11 [ffff80008d3879e0] exp_pseudoroot at ffffc4ef3b6e4984
  #12 [ffff80008d387a90] nfsd4_putrootfh at ffffc4ef3b6f8720
  #13 [ffff80008d387ab0] nfsd4_proc_compound at ffffc4ef3b6fe4cc
  #14 [ffff80008d387b70] nfsd_dispatch at ffffc4ef3b6cf428
  #15 [ffff80008d387c30] svc_process_common at ffffc4ef3c66235c
  #16 [ffff80008d387d20] svc_process at ffffc4ef3c6652f8
  #17 [ffff80008d387d90] svc_recv at ffffc4ef3c68c5d0
  #18 [ffff80008d387e10] nfsd at ffffc4ef3b6cb968
  #19 [ffff80008d387e60] kthread at ffffc4ef3ad4aca4
  
An nfsd thread sent an upcall and set the cache to CACHE_PENDING state,
waiting for the downcall to complete. However, due to memory fault
injection, this downcall failed and the userspace daemon did not retry.
The nfsd thread could only wait for cache cleanup to clear the
CACHE_PENDING state and resend the upcall.

Under certain edge cases, the cache_detail scanning interval could be set
to a large value like 1800 seconds, causing cache cleanup to be delayed
well beyond the cache's expiry time. This behavior seems unreasonable.

This patch series fix two issues related to the cache_detail nextcheck
time updates in the sunrpc subsystem. The first patch ensures nextcheck
time is properly updated when adding new cache entries to an cache_detail.
The second  patch fixes a race condition between cache cleanup and entry
removal that can result in stale nextcheck times. 

Long Li (2):
  sunrpc: update nextcheck time when adding new cache entries
  sunrpc: fix race in cache cleanup causing stale nextcheck time

 net/sunrpc/cache.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.39.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ