netdev - Re: [PATCH net-next] net: sched: flower: insert filter to ht before offloading it to hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190411111342.GA29053@splinter>
Date:   Thu, 11 Apr 2019 14:13:42 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     Vlad Buslov <vladbu@...lanox.com>
Cc:     netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com,
        jiri@...nulli.us, davem@...emloft.net, john.hurley@...ronome.com
Subject: Re: [PATCH net-next] net: sched: flower: insert filter to ht before
 offloading it to hw

On Fri, Apr 05, 2019 at 08:56:26PM +0300, Vlad Buslov wrote:
> John reports:
> 
> Recent refactoring of fl_change aims to use the classifier spinlock to
> avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
> function was moved to before the lock is taken. This can create problems
> for drivers if duplicate filters are created (commmon in ovs tc offload
> due to filters being triggered by user-space matches).
> 
> Drivers registered for such filters will now receive multiple copies of
> the same rule, each with a different cookie value. This means that the
> drivers would need to do a full match field lookup to determine
> duplicates, repeating work that will happen in flower __fl_lookup().
> Currently, drivers do not expect to receive duplicate filters.
> 
> To fix this, verify that filter with same key is not present in flower
> classifier hash table and insert the new filter to the flower hash table
> before offloading it to hardware. Implement helper function
> fl_ht_insert_unique() to atomically verify/insert a filter.
> 
> This change makes filter visible to fast path at the beginning of
> fl_change() function, which means it can no longer be freed directly in
> case of error. Refactor fl_change() error handling code to deallocate the
> filter with rcu timeout.
> 
> Fixes: 620da4860827 ("net: sched: flower: refactor fl_change")
> Reported-by: John Hurley <john.hurley@...ronome.com>
> Signed-off-by: Vlad Buslov <vladbu@...lanox.com>

Vlad,

Our regression machines all hit a NULL pointer dereference [1] which I
bisected to this patch. Created this reproducer that you can use:

ip netns add ns1
ip -n ns1 link add dev dummy1 type dummy
tc -n ns1 qdisc add dev dummy1 clsact
tc -n ns1 filter add dev dummy1 ingress pref 1 proto ip \
        flower skip_hw src_ip 192.0.2.1 action drop
ip netns del ns1

Can you please look into this? Thanks

[1]
[    5.332176] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[    5.334372] #PF error: [normal kernel read fault]
[    5.335619] PGD 0 P4D 0
[    5.336360] Oops: 0000 [#1] SMP
[    5.337249] CPU: 0 PID: 7 Comm: kworker/u2:0 Not tainted 5.1.0-rc4-custom-01473-g526bb57a6ad6 #1374
[    5.339232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[    5.341982] Workqueue: netns cleanup_net
[    5.342843] RIP: 0010:__fl_put+0x24/0xb0
[    5.343808] Code: 84 00 00 00 00 00 3e ff 8f f0 03 00 00 0f 88 da 7b 14 00 74 01 c3 80 bf f4 03 00 00 00 0f 84 83 00 00 00 4c 8b 87 c8 01 00 00 <41> 8b 50 04 49 8d 70 04
85 d2 74 60 8d 4a 01 39 ca 7f 52 81 fa fe
[    5.348099] RSP: 0018:ffffabe300663be0 EFLAGS: 00010202
[    5.349223] RAX: ffff9ea4ba1aff00 RBX: ffff9ea4b99af400 RCX: ffffabe300663c67
[    5.350572] RDX: 00000000000004c5 RSI: 0000000000000000 RDI: ffff9ea4b99af400
[    5.351919] RBP: ffff9ea4ba28e900 R08: 0000000000000000 R09: ffffffff9d1b0075
[    5.353272] R10: ffffeb2884e66b80 R11: ffffffff9dc4dcd8 R12: ffff9ea4b99af408
[    5.354635] R13: ffff9ea4b99ae400 R14: ffff9ea4b9a47800 R15: ffff9ea4b99ae000
[    5.355988] FS:  0000000000000000(0000) GS:ffff9ea4bba00000(0000) knlGS:0000000000000000
[    5.357436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.358530] CR2: 0000000000000004 CR3: 00000001398fa004 CR4: 0000000000160ef0
[    5.359876] Call Trace:
[    5.360360]  __fl_delete+0x223/0x3b0
[    5.361008]  fl_destroy+0xb4/0x130
[    5.361641]  tcf_proto_destroy+0x15/0x40
[    5.362429]  tcf_chain_flush+0x4e/0x60
[    5.363125]  __tcf_block_put+0xb4/0x150
[    5.363805]  clsact_destroy+0x30/0x40
[    5.364507]  qdisc_destroy+0x44/0x110
[    5.365218]  dev_shutdown+0x6e/0xa0
[    5.365821]  rollback_registered_many+0x25d/0x510
[    5.366724]  ? netdev_run_todo+0x221/0x280
[    5.367485]  unregister_netdevice_many+0x15/0xa0
[    5.368355]  default_device_exit_batch+0x13f/0x170
[    5.369268]  ? wait_woken+0x80/0x80
[    5.369910]  cleanup_net+0x19a/0x280
[    5.370558]  process_one_work+0x1f5/0x3f0
[    5.371326]  worker_thread+0x28/0x3c0
[    5.372038]  ? process_one_work+0x3f0/0x3f0
[    5.372755]  kthread+0x10d/0x130
[    5.373358]  ? __kthread_create_on_node+0x180/0x180
[    5.374298]  ret_from_fork+0x35/0x40
[    5.374934] CR2: 0000000000000004
[    5.375454] ---[ end trace c20e7f74127772e5 ]---
[    5.376284] RIP: 0010:__fl_put+0x24/0xb0
[    5.377003] Code: 84 00 00 00 00 00 3e ff 8f f0 03 00 00 0f 88 da 7b 14 00 74 01 c3 80 bf f4 03 00 00 00 0f 84 83 00 00 00 4c 8b 87 c8 01 00 00 <41> 8b 50 04 49 8d 70 04
85 d2 74 60 8d 4a 01 39 ca 7f 52 81 fa fe
[    5.380269] RSP: 0018:ffffabe300663be0 EFLAGS: 00010202
[    5.381237] RAX: ffff9ea4ba1aff00 RBX: ffff9ea4b99af400 RCX: ffffabe300663c67
[    5.382551] RDX: 00000000000004c5 RSI: 0000000000000000 RDI: ffff9ea4b99af400
[    5.383972] RBP: ffff9ea4ba28e900 R08: 0000000000000000 R09: ffffffff9d1b0075
[    5.385314] R10: ffffeb2884e66b80 R11: ffffffff9dc4dcd8 R12: ffff9ea4b99af408
[    5.386616] R13: ffff9ea4b99ae400 R14: ffff9ea4b9a47800 R15: ffff9ea4b99ae000
[    5.387986] FS:  0000000000000000(0000) GS:ffff9ea4bba00000(0000) knlGS:0000000000000000
[    5.389512] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.390546] CR2: 0000000000000004 CR3: 00000001398fa004 CR4: 0000000000160ef0