[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de8e2709-8d7f-4e51-a4a4-35bad72ba136@mojatatu.com>
Date: Thu, 13 Jun 2024 23:47:38 -0300
From: Pedro Tammela <pctammela@...atatu.com>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Network Development
<netdev@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [net/sched] Question: Locks for clearing ERR_PTR() value from
idrinfo->action_idr ?
On 13/06/2024 21:58, Tetsuo Handa wrote:
>
> Is there a possibility that tcf_idr_check_alloc() is called without holding
> rtnl_mutex?
There is, but not in the code path of this reproducer.
> If yes, adding a sleep before "goto again;" would help. But if no,
> is this a sign that some path forgot to call tcf_idr_{cleanup,insert_many}() ?
The reproducer is sending a new action message with 2 actions.
Actions are committed to the idr after processing in order to make them
visible together and after any errors are caught.
The bug happens when the actions in the message refer to the same index.
Since the first processing succeeds, adding -EBUSY to the index, the
second processing, which references the same index, will loop forever.
After the change to rely on RCU for this check, instead of the idr lock,
the hangs became more noticeable to syzbot since now it's hanging a
system-wide lock.
Powered by blists - more mailing lists