lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAL+tcoB911=NZYiiAHV8vRv+=GdWmXqNv0YWd9mc4vLaTgjN1g@mail.gmail.com> Date: Tue, 4 Apr 2023 10:46:15 +0800 From: Jason Xing <kerneljasonxing@...il.com> To: Kuniyuki Iwashima <kuniyu@...zon.com> Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org, syzbot <syzkaller@...glegroups.com>, "Dae R . Jeong" <threeearcat@...il.com> Subject: Re: [PATCH v1 net 1/2] raw: Fix NULL deref in raw_get_next(). On Tue, Apr 4, 2023 at 4:02 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote: > > Dae R. Jeong reported a NULL deref in raw_get_next() [0]. > > It seems that the repro was running these sequences in parallel so > that one thread was iterating on a socket that was being freed in > another netns. > > unshare(0x40060200) > r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00') > socket$inet_icmp_raw(0x2, 0x3, 0x1) > pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f) > > After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we > use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW > sockets. However, we should use spinlock for slow paths to avoid > the NULL deref. > > Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object > is not reused during iteration in the grace period. In fact, the > lockless readers do not check the nulls marker with get_nulls_value(). > So, SOCK_RAW should use hlist instead of hlist_nulls. > > Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(), > let's convert hlist_nulls to hlist and use sk_for_each_rcu() for > fast paths and sk_for_each() and spinlock for /proc/net/raw. > > [0]: > general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] > RIP: 0010:sock_net include/net/sock.h:649 [inline] > RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] > RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] > RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 > Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef > RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 > RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 > RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 > RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 > R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 > R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 > FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225 > seq_read+0x224/0x320 fs/seq_file.c:162 > pde_read fs/proc/inode.c:316 [inline] > proc_reg_read+0x23f/0x330 fs/proc/inode.c:328 > vfs_read+0x31e/0xd30 fs/read_write.c:468 > ksys_pread64 fs/read_write.c:665 [inline] > __do_sys_pread64 fs/read_write.c:675 [inline] > __se_sys_pread64 fs/read_write.c:672 [inline] > __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672 > do_syscall_x64 arch/x86/entry/common.c:51 [inline] > do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x478d29 > Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 > RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29 > RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003 > RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000 > R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740 > R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50 > </TASK> > Modules linked in: > ---[ end trace 0000000000000000 ]--- > RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] > RIP: 0010:sock_net include/net/sock.h:649 [inline] > RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] > RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] > RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 > Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef > RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 > RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 > RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 > RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 > R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 > R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 > FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU") > Reported-by: syzbot <syzkaller@...glegroups.com> > Reported-by: Dae R. Jeong <threeearcat@...il.com> > Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/ > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com> > --- > include/net/raw.h | 4 ++-- > net/ipv4/raw.c | 36 +++++++++++++++++++----------------- > net/ipv4/raw_diag.c | 10 ++++------ > net/ipv6/raw.c | 10 ++++------ > 4 files changed, 29 insertions(+), 31 deletions(-) > > diff --git a/include/net/raw.h b/include/net/raw.h > index 2c004c20ed99..3af5289fdead 100644 > --- a/include/net/raw.h > +++ b/include/net/raw.h > @@ -37,7 +37,7 @@ int raw_rcv(struct sock *, struct sk_buff *); > struct raw_hashinfo { > spinlock_t lock; > > - struct hlist_nulls_head ht[RAW_HTABLE_SIZE] ____cacheline_aligned; > + struct hlist_head ht[RAW_HTABLE_SIZE] ____cacheline_aligned; > }; > > static inline u32 raw_hashfunc(const struct net *net, u32 proto) > @@ -51,7 +51,7 @@ static inline void raw_hashinfo_init(struct raw_hashinfo *hashinfo) > > spin_lock_init(&hashinfo->lock); > for (i = 0; i < RAW_HTABLE_SIZE; i++) > - INIT_HLIST_NULLS_HEAD(&hashinfo->ht[i], i); > + INIT_HLIST_HEAD(&hashinfo->ht[i]); > } > > #ifdef CONFIG_PROC_FS > diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c > index 94df935ee0c5..8088a5011e7d 100644 > --- a/net/ipv4/raw.c > +++ b/net/ipv4/raw.c > @@ -91,12 +91,12 @@ EXPORT_SYMBOL_GPL(raw_v4_hashinfo); > int raw_hash_sk(struct sock *sk) > { > struct raw_hashinfo *h = sk->sk_prot->h.raw_hash; > - struct hlist_nulls_head *hlist; > + struct hlist_head *hlist; > > hlist = &h->ht[raw_hashfunc(sock_net(sk), inet_sk(sk)->inet_num)]; > > spin_lock(&h->lock); > - __sk_nulls_add_node_rcu(sk, hlist); > + sk_add_node_rcu(sk, hlist); > sock_set_flag(sk, SOCK_RCU_FREE); > spin_unlock(&h->lock); > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); > @@ -110,7 +110,7 @@ void raw_unhash_sk(struct sock *sk) > struct raw_hashinfo *h = sk->sk_prot->h.raw_hash; > > spin_lock(&h->lock); > - if (__sk_nulls_del_node_init_rcu(sk)) > + if (sk_del_node_init_rcu(sk)) > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); > spin_unlock(&h->lock); > } > @@ -163,16 +163,15 @@ static int icmp_filter(const struct sock *sk, const struct sk_buff *skb) > static int raw_v4_input(struct net *net, struct sk_buff *skb, > const struct iphdr *iph, int hash) > { > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > int sdif = inet_sdif(skb); > + struct hlist_head *hlist; > int dif = inet_iif(skb); > int delivered = 0; > struct sock *sk; > > hlist = &raw_v4_hashinfo.ht[hash]; > rcu_read_lock(); > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > if (!raw_v4_match(net, sk, iph->protocol, > iph->saddr, iph->daddr, dif, sdif)) > continue; > @@ -264,10 +263,9 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info) > void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info) > { > struct net *net = dev_net(skb->dev); > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > int dif = skb->dev->ifindex; > int sdif = inet_sdif(skb); > + struct hlist_head *hlist; > const struct iphdr *iph; > struct sock *sk; > int hash; > @@ -276,7 +274,7 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info) > hlist = &raw_v4_hashinfo.ht[hash]; > > rcu_read_lock(); > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > iph = (const struct iphdr *)skb->data; > if (!raw_v4_match(net, sk, iph->protocol, > iph->daddr, iph->saddr, dif, sdif)) > @@ -950,14 +948,13 @@ static struct sock *raw_get_first(struct seq_file *seq, int bucket) > { > struct raw_hashinfo *h = pde_data(file_inode(seq->file)); > struct raw_iter_state *state = raw_seq_private(seq); > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > + struct hlist_head *hlist; > struct sock *sk; > > for (state->bucket = bucket; state->bucket < RAW_HTABLE_SIZE; > ++state->bucket) { > hlist = &h->ht[state->bucket]; > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each(sk, hlist) { > if (sock_net(sk) == seq_file_net(seq)) > return sk; > } > @@ -970,7 +967,7 @@ static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk) > struct raw_iter_state *state = raw_seq_private(seq); > > do { > - sk = sk_nulls_next(sk); > + sk = sk_next(sk); > } while (sk && sock_net(sk) != seq_file_net(seq)); > > if (!sk) > @@ -989,9 +986,12 @@ static struct sock *raw_get_idx(struct seq_file *seq, loff_t pos) > } > [...] > void *raw_seq_start(struct seq_file *seq, loff_t *pos) > - __acquires(RCU) > + __acquires(&h->lock) > { > - rcu_read_lock(); > + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); > + > + spin_lock(&h->lock); I would like to ask two questions which make me confused: 1) Why would we use spin_lock to protect the socket in a raw hashtable for reader's safety under the rcu protection? Normally, if we use the RCU protection, we only make sure that we need to destroy the socket by calling call_rcu() which would prevent the READER of the socket from getting a NULL pointer. 2) Using spin lock when we're cating /proc/net/raw file may block/postpone the action of hashing socket somewhere else? Thanks, Jason > + > return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; > } > EXPORT_SYMBOL_GPL(raw_seq_start); > @@ -1010,9 +1010,11 @@ void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos) > EXPORT_SYMBOL_GPL(raw_seq_next); > > void raw_seq_stop(struct seq_file *seq, void *v) > - __releases(RCU) > + __releases(&h->lock) > { > - rcu_read_unlock(); > + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); > + > + spin_unlock(&h->lock); > } > EXPORT_SYMBOL_GPL(raw_seq_stop); > > diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c > index 999321834b94..da3591a66a16 100644 > --- a/net/ipv4/raw_diag.c > +++ b/net/ipv4/raw_diag.c > @@ -57,8 +57,7 @@ static bool raw_lookup(struct net *net, struct sock *sk, > static struct sock *raw_sock_get(struct net *net, const struct inet_diag_req_v2 *r) > { > struct raw_hashinfo *hashinfo = raw_get_hashinfo(r); > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > + struct hlist_head *hlist; > struct sock *sk; > int slot; > > @@ -68,7 +67,7 @@ static struct sock *raw_sock_get(struct net *net, const struct inet_diag_req_v2 > rcu_read_lock(); > for (slot = 0; slot < RAW_HTABLE_SIZE; slot++) { > hlist = &hashinfo->ht[slot]; > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > if (raw_lookup(net, sk, r)) { > /* > * Grab it and keep until we fill > @@ -142,9 +141,8 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, > struct raw_hashinfo *hashinfo = raw_get_hashinfo(r); > struct net *net = sock_net(skb->sk); > struct inet_diag_dump_data *cb_data; > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > int num, s_num, slot, s_slot; > + struct hlist_head *hlist; > struct sock *sk = NULL; > struct nlattr *bc; > > @@ -161,7 +159,7 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, > num = 0; > > hlist = &hashinfo->ht[slot]; > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > struct inet_sock *inet = inet_sk(sk); > > if (!net_eq(sock_net(sk), net)) > diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c > index bac9ba747bde..a327aa481df4 100644 > --- a/net/ipv6/raw.c > +++ b/net/ipv6/raw.c > @@ -141,10 +141,9 @@ EXPORT_SYMBOL(rawv6_mh_filter_unregister); > static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr) > { > struct net *net = dev_net(skb->dev); > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > const struct in6_addr *saddr; > const struct in6_addr *daddr; > + struct hlist_head *hlist; > struct sock *sk; > bool delivered = false; > __u8 hash; > @@ -155,7 +154,7 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr) > hash = raw_hashfunc(net, nexthdr); > hlist = &raw_v6_hashinfo.ht[hash]; > rcu_read_lock(); > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > int filtered; > > if (!raw_v6_match(net, sk, nexthdr, daddr, saddr, > @@ -333,15 +332,14 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr, > u8 type, u8 code, int inner_offset, __be32 info) > { > struct net *net = dev_net(skb->dev); > - struct hlist_nulls_head *hlist; > - struct hlist_nulls_node *hnode; > + struct hlist_head *hlist; > struct sock *sk; > int hash; > > hash = raw_hashfunc(net, nexthdr); > hlist = &raw_v6_hashinfo.ht[hash]; > rcu_read_lock(); > - sk_nulls_for_each(sk, hnode, hlist) { > + sk_for_each_rcu(sk, hlist) { > /* Note: ipv6_hdr(skb) != skb->data */ > const struct ipv6hdr *ip6h = (const struct ipv6hdr *)skb->data; > > -- > 2.30.2 >
Powered by blists - more mailing lists