netdev - Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190215113041.GA10511@splinter>
Date:   Fri, 15 Feb 2019 13:30:41 +0200
From:   Ido Schimmel <idosch@...sch.org>
To:     Vlad Buslov <vladbu@...lanox.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "jiri@...nulli.us" <jiri@...nulli.us>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>
Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list
 with filter_chain_lock mutex

On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote:
> 
> On Thu 14 Feb 2019 at 18:24, Ido Schimmel <idosch@...sch.org> wrote:
> > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote:
> >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
> >> when accessing filter_chain list, instead of relying on rtnl lock.
> >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to
> >> verify that all users of chain_list have the lock taken.
> >>
> >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
> >> all necessary code while holding chain lock in order to prevent
> >> invalidation of chain_info structure by potential concurrent change. This
> >> also serializes calls to tcf_chain0_head_change(), which allows head change
> >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl
> >> mutex.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@...lanox.com>
> >> Acked-by: Jiri Pirko <jiri@...lanox.com>
> >
> > With this sequence [1] I get the following trace [2]. Bisected it to
> > this patch. Note that second filter is rejected by the device driver
> > (that's the intention). I guess it is not properly removed from the
> > filter chain in the error path?
> >
> > Thanks
> >
> > [1]
> > # tc qdisc add dev swp3 clsact
> > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > RTNETLINK answers: File exists
> > We have an error talking to the kernel, -1
> > # tc qdisc del dev swp3 clsact
> >
> > [2]
> > [   70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access
> > [   70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
> > [   70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304
> > [   70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
> > [   70.580204] RIP: 0010:mall_reoffload+0x14a/0x760
> > [   70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83
> > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd
> > [   70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207
> > [   70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000
> > [   70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040
> > [   70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa
> > [   70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000
> > [   70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010
> > [   70.652163] FS:  00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000
> > [   70.661218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0
> > [   70.675633] Call Trace:
> > [   70.693046]  tcf_block_playback_offloads+0x94/0x230
> > [   70.710617]  __tcf_block_cb_unregister+0xf7/0x2d0
> > [   70.734143]  mlxsw_sp_setup_tc+0x20f/0x660
> > [   70.738739]  tcf_block_offload_unbind+0x22b/0x350
> > [   70.748898]  __tcf_block_put+0x203/0x630
> > [   70.769700]  tcf_block_put_ext+0x2f/0x40
> > [   70.774098]  clsact_destroy+0x7a/0xb0
> > [   70.782604]  qdisc_destroy+0x11a/0x5c0
> > [   70.786810]  qdisc_put+0x5a/0x70
> > [   70.790435]  notify_and_destroy+0x8e/0xa0
> > [   70.794933]  qdisc_graft+0xbb7/0xef0
> > [   70.809009]  tc_get_qdisc+0x518/0xa30
> > [   70.821530]  rtnetlink_rcv_msg+0x397/0xa20
> > [   70.835510]  netlink_rcv_skb+0x132/0x380
> > [   70.848419]  netlink_unicast+0x4c0/0x690
> > [   70.866019]  netlink_sendmsg+0x929/0xe10
> > [   70.882134]  sock_sendmsg+0xc8/0x110
> > [   70.886144]  ___sys_sendmsg+0x77a/0x8f0
> > [   70.924064]  __sys_sendmsg+0xf7/0x250
> > [   70.955727]  do_syscall_64+0x14d/0x610
> 
> Hi Ido,
> 
> Thanks for reporting this.
> 
> I looked at the code and problem seems to be matchall classifier
> specific. My implementation of unlocked cls API assumes that concurrent
> insertions are possible and checks for it when deleting "empty" tp.
> Since classifiers don't expose number of elements, the only way to test
> this is to do tp->walk() on them and assume that walk callback is called
> once per filter on every classifier. In your example new tp is created
> for second filter, filter insertion fails, number of elements on newly
> created tp is checked with tp->walk() before deleting it. However,
> matchall classifier always calls the tp->walk() callback once, even when
> it doesn't have a valid filter (in this case with NULL filter pointer).
> 
> Possible ways to fix this:
> 
> 1) Check for NULL filter pointer in tp->walk() callback and ignore it
> when counting filters on tp - will work but I don't like it because I
> don't think it is ever correct to call tp->walk() callback with NULL
> filter pointer.
> 
> 2) Fix matchall to not call tp->walk() callback with NULL filter
> pointers - my preferred simple solution.
> 
> 3) Extend tp API to have direct way to count filters by implementing
> tp->count - requires change to every classifier. Or maybe add it as
> optional API that only unlocked classifiers are required to implement
> and just delete rtnl lock dependent tp's without checking for concurrent
> insertions.
> 
> What do you think?

Since the problem is matchall-specific, then it makes sense to fix it in
matchall like you suggested in option #2.

Can you please use this opportunity and audit other classifiers to
confirm problem is indeed specific to matchall?

In any case, feel free to send me a patch and I'll test it.

Thanks