netdev - Re: [net/sched] Question: Locks for clearing ERR_PTR() value from idrinfo->action

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <de8e2709-8d7f-4e51-a4a4-35bad72ba136@mojatatu.com>
Date: Thu, 13 Jun 2024 23:47:38 -0300
From: Pedro Tammela <pctammela@...atatu.com>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
 Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
 Jiri Pirko <jiri@...nulli.us>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Network Development
 <netdev@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [net/sched] Question: Locks for clearing ERR_PTR() value from
 idrinfo->action_idr ?

On 13/06/2024 21:58, Tetsuo Handa wrote:
> 
> Is there a possibility that tcf_idr_check_alloc() is called without holding
> rtnl_mutex?

There is, but not in the code path of this reproducer.

> If yes, adding a sleep before "goto again;" would help. But if no,
> is this a sign that some path forgot to call tcf_idr_{cleanup,insert_many}() ?

The reproducer is sending a new action message with 2 actions.
Actions are committed to the idr after processing in order to make them 
visible together and after any errors are caught.

The bug happens when the actions in the message refer to the same index. 
Since the first processing succeeds, adding -EBUSY to the index, the 
second processing, which references the same index, will loop forever.

After the change to rely on RCU for this check, instead of the idr lock, 
the hangs became more noticeable to syzbot since now it's hanging a 
system-wide lock.