lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a0b432d-284a-45eb-991a-c7bba2c93e0b@roeck-us.net>
Date: Mon, 13 Jan 2025 11:50:36 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Breno Leitao <leitao@...ian.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Thomas Graf <tgraf@...g.ch>,
	Herbert Xu <herbert@...dor.apana.org.au>, Tejun Heo <tj@...nel.org>,
	Hao Luo <haoluo@...gle.com>, Josh Don <joshdon@...gle.com>,
	Barret Rhoden <brho@...gle.com>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rhashtable: Fix potential deadlock by moving
 schedule_work outside lock

Hi,

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
> 
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
> 
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.
> 
> Import to say that this calls rht_grow_above_75(), which reads from
> struct rhashtable without holding the lock, if this is a problem, we can
> move the check to the lock, and schedule the workqueue after the lock.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Suggested-by: Tejun Heo <tj@...nel.org>
> Signed-off-by: Breno Leitao <leitao@...ian.org>

With this patch in linux-next, I get some unit test errors.

[    3.800185]     # Subtest: hw_breakpoint
[    3.800469]     # module: hw_breakpoint_test
[    3.800718]     1..9
[    3.810825]     # test_one_cpu: pass:1 fail:0 skip:0 total:1
[    3.810950]     ok 1 test_one_cpu
[    3.814941]     # test_many_cpus: pass:1 fail:0 skip:0 total:1
[    3.815092]     ok 2 test_many_cpus
[    3.822977]     # test_one_task_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.823100]     ok 3 test_one_task_on_all_cpus
[    3.829071]     # test_two_tasks_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.829199]     ok 4 test_two_tasks_on_all_cpus
[    3.830914]     # test_one_task_on_one_cpu: ASSERTION FAILED at kernel/events/hw_breakpoint_test.c:70
[    3.830914]     Expected IS_ERR(bp) to be false, but is true
[    3.832572]     # test_one_task_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.832572]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.833002]     # test_one_task_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.833071]     not ok 5 test_one_task_on_one_cpu
[    3.834994]     # test_one_task_mixed: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.834994]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.835583]     # test_one_task_mixed: pass:0 fail:1 skip:0 total:1
[    3.835638]     not ok 6 test_one_task_mixed
[    3.837131]     # test_two_tasks_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.837131]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.837827]     # test_two_tasks_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.837882]     not ok 7 test_two_tasks_on_one_cpu
[    3.839868]     # test_two_tasks_on_one_all_cpus: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.839868]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.840294]     # test_two_tasks_on_one_all_cpus: pass:0 fail:1 skip:0 total:1
[    3.840538]     not ok 8 test_two_tasks_on_one_all_cpus
[    3.843599]     # test_task_on_all_and_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.843599]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.844163]     # test_task_on_all_and_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.844215]     not ok 9 test_task_on_all_and_one_cpu
[    3.844453] # hw_breakpoint: pass:4 fail:5 skip:0 total:9
[    3.844610] # Totals: pass:4 fail:5 skip:0 total:9
[    3.844797] not ok 1 hw_breakpoint

Sometimes I also see:

[   12.579842]     # Subtest: Handshake API tests
[   12.579971]     1..11
[   12.580052]         KTAP version 1
[   12.580410]         # Subtest: req_alloc API fuzzing
[   12.582206]         ok 1 handshake_req_alloc NULL proto
[   12.583541]         ok 2 handshake_req_alloc CLASS_NONE
[   12.585419]         ok 3 handshake_req_alloc CLASS_MAX
[   12.587291]         ok 4 handshake_req_alloc no callbacks
[   12.589239]         ok 5 handshake_req_alloc no done callback
[   12.590758]         ok 6 handshake_req_alloc excessive privsize
[   12.592642]         ok 7 handshake_req_alloc all good
[   12.592802]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[   12.593185]     ok 1 req_alloc API fuzzing
[   12.597371]     # req_submit NULL req arg: pass:1 fail:0 skip:0 total:1
[   12.597501]     ok 2 req_submit NULL req arg
[   12.599208]     # req_submit NULL sock arg: pass:1 fail:0 skip:0 total:1
[   12.599338]     ok 3 req_submit NULL sock arg
[   12.601549]     # req_submit NULL sock->file: pass:1 fail:0 skip:0 total:1
[   12.601680]     ok 4 req_submit NULL sock->file
[   12.605334]     # req_lookup works: pass:1 fail:0 skip:0 total:1
[   12.605469]     ok 5 req_lookup works
[   12.609596]     # req_submit max pending: pass:1 fail:0 skip:0 total:1
[   12.609730]     ok 6 req_submit max pending
[   12.613796]     # req_submit multiple: pass:1 fail:0 skip:0 total:1
[   12.614250]     ok 7 req_submit multiple
[   12.616395]     # req_cancel before accept: ASSERTION FAILED at net/handshake/handshake-test.c:333
[   12.616395]     Expected err == 0, but
[   12.616395]         err == -16 (0xfffffffffffffff0)
[   12.618061]     # req_cancel before accept: pass:0 fail:1 skip:0 total:1
[   12.618135]     not ok 8 req_cancel before accept
[   12.619437]     # req_cancel after accept: ASSERTION FAILED at net/handshake/handshake-test.c:369
[   12.619437]     Expected err == 0, but
[   12.619437]         err == -16 (0xfffffffffffffff0)
[   12.621055]     # req_cancel after accept: pass:0 fail:1 skip:0 total:1
[   12.621119]     not ok 9 req_cancel after accept
[   12.622342]     # req_cancel after done: ASSERTION FAILED at net/handshake/handshake-test.c:411
[   12.622342]     Expected err == 0, but
[   12.622342]         err == -16 (0xfffffffffffffff0)
[   12.623547]     # req_cancel after done: pass:0 fail:1 skip:0 total:1
[   12.623608]     not ok 10 req_cancel after done
[   12.625297]     # req_destroy works: ASSERTION FAILED at net/handshake/handshake-test.c:469
[   12.625297]     Expected err == 0, but
[   12.625297]         err == -16 (0xfffffffffffffff0)
[   12.626633]     # req_destroy works: pass:0 fail:1 skip:0 total:1
[   12.626696]     not ok 11 req_destroy works
[   12.626837] # Handshake API tests: pass:7 fail:4 skip:0 total:11
[   12.627298] # Totals: pass:13 fail:4 skip:0 total:17
[   12.627446] not ok 90 Handshake API tests

The log is from a test with x86_64, but other architectures are affected
as well. Reverting this patch fixes the problem.

Bisect log is attached for reference.

Guenter

---
# bad: [37136bf5c3a6f6b686d74f41837a6406bec6b7bc] Add linux-next specific files for 20250113
# good: [9d89551994a430b50c4fffcb1e617a057fa76e20] Linux 6.13-rc6
git bisect start 'HEAD' 'v6.13-rc6'
# good: [25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038
# bad: [c6ab5ee56509953c3ee6647ac9f266a7c628f082] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
git bisect bad c6ab5ee56509953c3ee6647ac9f266a7c628f082
# good: [39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d] Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next
git bisect good 39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d
# bad: [0f8b2b2250abe043cef890caa378bebe5c4f5d88] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect bad 0f8b2b2250abe043cef890caa378bebe5c4f5d88
# good: [67fcb0469b17071890761d437bdf83d2e2d14575] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect good 67fcb0469b17071890761d437bdf83d2e2d14575
# bad: [7fd4a4e4c397fc315f8ebf0fb22e50526f66580a] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 7fd4a4e4c397fc315f8ebf0fb22e50526f66580a
# good: [e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7] drm/amd/display: Initialize denominator defaults to 1
git bisect good e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7
# bad: [5b7981c1ca61ca7ad7162cfe95bf271d001d29ac] crypto: x86/aes-xts - use .irp when useful
git bisect bad 5b7981c1ca61ca7ad7162cfe95bf271d001d29ac
# good: [ce8fd0500b741b3669c246cc604f1f2343cdd6fd] crypto: qce - use __free() for a buffer that's always freed
git bisect good ce8fd0500b741b3669c246cc604f1f2343cdd6fd
# good: [5e252f490c1c2c989cdc2ca50744f30fbca356b4] crypto: tea - stop using cra_alignmask
git bisect good 5e252f490c1c2c989cdc2ca50744f30fbca356b4
# good: [f916e44487f56df4827069ff3a2070c0746dc511] crypto: keywrap - remove assignment of 0 to cra_alignmask
git bisect good f916e44487f56df4827069ff3a2070c0746dc511
# bad: [b9b894642fede191d50230d08608bd4f4f49f73d] crypto: lib/gf128mul - Remove some bbe deadcode
git bisect bad b9b894642fede191d50230d08608bd4f4f49f73d
# bad: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock
git bisect bad e1d3422c95f003eba241c176adfe593c33e8a8f6
# first bad commit: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ