[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40bb5d4c-e21d-4eac-aec0-25b2f722be6d@orange.com>
Date: Thu, 14 Nov 2024 19:24:27 +0100
From: Alexandre Ferrieux <alexandre.ferrieux@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: edumazet@...gle.com, jhs@...atatu.com, xiyou.wangcong@...il.com,
jiri@...nulli.us, horms@...nel.org, netdev@...r.kernel.org
Subject: RFC: chasing all idr_remove() misses
Hi,
In the recent fix of u32's IDR leaks, one side remark is that the problem went
unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
this call site.
Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites
(excluding those hidden in macros, if any) don't bother to extract the value
returned by idr_remove().
Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so
the mismatch is detectable (and is detected, returning NULL). However, in racy
situations you may end up killing an innocent fresh entry, which may really
break things a bit later. And in all cases, a true bug is the root cause.
So, unless we have reasons to think cls_u32 was the only place where two ID
encodings might lend themselves to confusion, I'm wondering if it wouldn't make
sense to chase the issue more systematically:
- either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
(a year-long endeavor implying tens of maintainers)
- or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
or even radix_tree_delete_item().
Opinions ?
Powered by blists - more mailing lists