[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzwLU6JHOTmZQ4oS@pop-os.localdomain>
Date: Mon, 18 Nov 2024 19:51:47 -0800
From: Cong Wang <xiyou.wangcong@...il.com>
To: Alexandre Ferrieux <alexandre.ferrieux@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, edumazet@...gle.com, jhs@...atatu.com,
jiri@...nulli.us, horms@...nel.org, netdev@...r.kernel.org
Subject: Re: RFC: chasing all idr_remove() misses
On Thu, Nov 14, 2024 at 07:24:27PM +0100, Alexandre Ferrieux wrote:
> Hi,
>
> In the recent fix of u32's IDR leaks, one side remark is that the problem went
> unnoticed for 7 years due to the NULL result from idr_remove() being ignored at
> this call site.
I'd blame the lack of self test coverage. :)
>
> Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites
> (excluding those hidden in macros, if any) don't bother to extract the value
> returned by idr_remove().
>
> Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so
> the mismatch is detectable (and is detected, returning NULL). However, in racy
> situations you may end up killing an innocent fresh entry, which may really
> break things a bit later. And in all cases, a true bug is the root cause.
>
> So, unless we have reasons to think cls_u32 was the only place where two ID
> encodings might lend themselves to confusion, I'm wondering if it wouldn't make
> sense to chase the issue more systematically:
>
> - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually
> (a year-long endeavor implying tens of maintainers)
>
> - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself,
> or even radix_tree_delete_item().
>
> Opinions ?
Yeah, or simply WARN_ON uncleaned IDR in idr_destroy(), which is a more
common pattern.
Thanks.
Powered by blists - more mailing lists