[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a0b432d-284a-45eb-991a-c7bba2c93e0b@roeck-us.net>
Date: Mon, 13 Jan 2025 11:50:36 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Breno Leitao <leitao@...ian.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Thomas Graf <tgraf@...g.ch>,
Herbert Xu <herbert@...dor.apana.org.au>, Tejun Heo <tj@...nel.org>,
Hao Luo <haoluo@...gle.com>, Josh Don <joshdon@...gle.com>,
Barret Rhoden <brho@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rhashtable: Fix potential deadlock by moving
schedule_work outside lock
Hi,
On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
>
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
>
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.
>
> Import to say that this calls rht_grow_above_75(), which reads from
> struct rhashtable without holding the lock, if this is a problem, we can
> move the check to the lock, and schedule the workqueue after the lock.
>
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Suggested-by: Tejun Heo <tj@...nel.org>
> Signed-off-by: Breno Leitao <leitao@...ian.org>
With this patch in linux-next, I get some unit test errors.
[ 3.800185] # Subtest: hw_breakpoint
[ 3.800469] # module: hw_breakpoint_test
[ 3.800718] 1..9
[ 3.810825] # test_one_cpu: pass:1 fail:0 skip:0 total:1
[ 3.810950] ok 1 test_one_cpu
[ 3.814941] # test_many_cpus: pass:1 fail:0 skip:0 total:1
[ 3.815092] ok 2 test_many_cpus
[ 3.822977] # test_one_task_on_all_cpus: pass:1 fail:0 skip:0 total:1
[ 3.823100] ok 3 test_one_task_on_all_cpus
[ 3.829071] # test_two_tasks_on_all_cpus: pass:1 fail:0 skip:0 total:1
[ 3.829199] ok 4 test_two_tasks_on_all_cpus
[ 3.830914] # test_one_task_on_one_cpu: ASSERTION FAILED at kernel/events/hw_breakpoint_test.c:70
[ 3.830914] Expected IS_ERR(bp) to be false, but is true
[ 3.832572] # test_one_task_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[ 3.832572] Expected hw_breakpoint_is_used() to be false, but is true
[ 3.833002] # test_one_task_on_one_cpu: pass:0 fail:1 skip:0 total:1
[ 3.833071] not ok 5 test_one_task_on_one_cpu
[ 3.834994] # test_one_task_mixed: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[ 3.834994] Expected hw_breakpoint_is_used() to be false, but is true
[ 3.835583] # test_one_task_mixed: pass:0 fail:1 skip:0 total:1
[ 3.835638] not ok 6 test_one_task_mixed
[ 3.837131] # test_two_tasks_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[ 3.837131] Expected hw_breakpoint_is_used() to be false, but is true
[ 3.837827] # test_two_tasks_on_one_cpu: pass:0 fail:1 skip:0 total:1
[ 3.837882] not ok 7 test_two_tasks_on_one_cpu
[ 3.839868] # test_two_tasks_on_one_all_cpus: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[ 3.839868] Expected hw_breakpoint_is_used() to be false, but is true
[ 3.840294] # test_two_tasks_on_one_all_cpus: pass:0 fail:1 skip:0 total:1
[ 3.840538] not ok 8 test_two_tasks_on_one_all_cpus
[ 3.843599] # test_task_on_all_and_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[ 3.843599] Expected hw_breakpoint_is_used() to be false, but is true
[ 3.844163] # test_task_on_all_and_one_cpu: pass:0 fail:1 skip:0 total:1
[ 3.844215] not ok 9 test_task_on_all_and_one_cpu
[ 3.844453] # hw_breakpoint: pass:4 fail:5 skip:0 total:9
[ 3.844610] # Totals: pass:4 fail:5 skip:0 total:9
[ 3.844797] not ok 1 hw_breakpoint
Sometimes I also see:
[ 12.579842] # Subtest: Handshake API tests
[ 12.579971] 1..11
[ 12.580052] KTAP version 1
[ 12.580410] # Subtest: req_alloc API fuzzing
[ 12.582206] ok 1 handshake_req_alloc NULL proto
[ 12.583541] ok 2 handshake_req_alloc CLASS_NONE
[ 12.585419] ok 3 handshake_req_alloc CLASS_MAX
[ 12.587291] ok 4 handshake_req_alloc no callbacks
[ 12.589239] ok 5 handshake_req_alloc no done callback
[ 12.590758] ok 6 handshake_req_alloc excessive privsize
[ 12.592642] ok 7 handshake_req_alloc all good
[ 12.592802] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[ 12.593185] ok 1 req_alloc API fuzzing
[ 12.597371] # req_submit NULL req arg: pass:1 fail:0 skip:0 total:1
[ 12.597501] ok 2 req_submit NULL req arg
[ 12.599208] # req_submit NULL sock arg: pass:1 fail:0 skip:0 total:1
[ 12.599338] ok 3 req_submit NULL sock arg
[ 12.601549] # req_submit NULL sock->file: pass:1 fail:0 skip:0 total:1
[ 12.601680] ok 4 req_submit NULL sock->file
[ 12.605334] # req_lookup works: pass:1 fail:0 skip:0 total:1
[ 12.605469] ok 5 req_lookup works
[ 12.609596] # req_submit max pending: pass:1 fail:0 skip:0 total:1
[ 12.609730] ok 6 req_submit max pending
[ 12.613796] # req_submit multiple: pass:1 fail:0 skip:0 total:1
[ 12.614250] ok 7 req_submit multiple
[ 12.616395] # req_cancel before accept: ASSERTION FAILED at net/handshake/handshake-test.c:333
[ 12.616395] Expected err == 0, but
[ 12.616395] err == -16 (0xfffffffffffffff0)
[ 12.618061] # req_cancel before accept: pass:0 fail:1 skip:0 total:1
[ 12.618135] not ok 8 req_cancel before accept
[ 12.619437] # req_cancel after accept: ASSERTION FAILED at net/handshake/handshake-test.c:369
[ 12.619437] Expected err == 0, but
[ 12.619437] err == -16 (0xfffffffffffffff0)
[ 12.621055] # req_cancel after accept: pass:0 fail:1 skip:0 total:1
[ 12.621119] not ok 9 req_cancel after accept
[ 12.622342] # req_cancel after done: ASSERTION FAILED at net/handshake/handshake-test.c:411
[ 12.622342] Expected err == 0, but
[ 12.622342] err == -16 (0xfffffffffffffff0)
[ 12.623547] # req_cancel after done: pass:0 fail:1 skip:0 total:1
[ 12.623608] not ok 10 req_cancel after done
[ 12.625297] # req_destroy works: ASSERTION FAILED at net/handshake/handshake-test.c:469
[ 12.625297] Expected err == 0, but
[ 12.625297] err == -16 (0xfffffffffffffff0)
[ 12.626633] # req_destroy works: pass:0 fail:1 skip:0 total:1
[ 12.626696] not ok 11 req_destroy works
[ 12.626837] # Handshake API tests: pass:7 fail:4 skip:0 total:11
[ 12.627298] # Totals: pass:13 fail:4 skip:0 total:17
[ 12.627446] not ok 90 Handshake API tests
The log is from a test with x86_64, but other architectures are affected
as well. Reverting this patch fixes the problem.
Bisect log is attached for reference.
Guenter
---
# bad: [37136bf5c3a6f6b686d74f41837a6406bec6b7bc] Add linux-next specific files for 20250113
# good: [9d89551994a430b50c4fffcb1e617a057fa76e20] Linux 6.13-rc6
git bisect start 'HEAD' 'v6.13-rc6'
# good: [25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038
# bad: [c6ab5ee56509953c3ee6647ac9f266a7c628f082] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
git bisect bad c6ab5ee56509953c3ee6647ac9f266a7c628f082
# good: [39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d] Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next
git bisect good 39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d
# bad: [0f8b2b2250abe043cef890caa378bebe5c4f5d88] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect bad 0f8b2b2250abe043cef890caa378bebe5c4f5d88
# good: [67fcb0469b17071890761d437bdf83d2e2d14575] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect good 67fcb0469b17071890761d437bdf83d2e2d14575
# bad: [7fd4a4e4c397fc315f8ebf0fb22e50526f66580a] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 7fd4a4e4c397fc315f8ebf0fb22e50526f66580a
# good: [e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7] drm/amd/display: Initialize denominator defaults to 1
git bisect good e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7
# bad: [5b7981c1ca61ca7ad7162cfe95bf271d001d29ac] crypto: x86/aes-xts - use .irp when useful
git bisect bad 5b7981c1ca61ca7ad7162cfe95bf271d001d29ac
# good: [ce8fd0500b741b3669c246cc604f1f2343cdd6fd] crypto: qce - use __free() for a buffer that's always freed
git bisect good ce8fd0500b741b3669c246cc604f1f2343cdd6fd
# good: [5e252f490c1c2c989cdc2ca50744f30fbca356b4] crypto: tea - stop using cra_alignmask
git bisect good 5e252f490c1c2c989cdc2ca50744f30fbca356b4
# good: [f916e44487f56df4827069ff3a2070c0746dc511] crypto: keywrap - remove assignment of 0 to cra_alignmask
git bisect good f916e44487f56df4827069ff3a2070c0746dc511
# bad: [b9b894642fede191d50230d08608bd4f4f49f73d] crypto: lib/gf128mul - Remove some bbe deadcode
git bisect bad b9b894642fede191d50230d08608bd4f4f49f73d
# bad: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock
git bisect bad e1d3422c95f003eba241c176adfe593c33e8a8f6
# first bad commit: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock
Powered by blists - more mailing lists