lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Mar 2023 22:19:10 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Hyunwoo Kim <v4bel@...ori.io>,
        Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Taehee Yoo <ap420073@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Dmitry Kozlov <xeb@...l.ru>,
        David Ahern <dsahern@...nel.org>, tudordana@...gle.com,
        netdev@...r.kernel.org, imv4bel@...il.com
Subject: Re: [PATCH] net: Fix invalid ip_route_output_ports() call

On Mon, Mar 20, 2023 at 10:08 PM Hyunwoo Kim <v4bel@...ori.io> wrote:
>
> On Mon, Mar 20, 2023 at 08:17:15PM -0700, Eric Dumazet wrote:
> > On Mon, Mar 20, 2023 at 7:49 PM Hyunwoo Kim <v4bel@...ori.io> wrote:
> > >
> > > If you pass the address of the struct flowi4 you declared as a
> > > local variable as the fl4 argument to ip_route_output_ports(),
> > > the subsequent call to xfrm_state_find() will read the local
> > > variable by AF_INET6 rather than AF_INET as per policy,
> > > which could cause it to go out of scope on the kernel stack.
> > >
> > > Reported-by: syzbot+ada7c035554bcee65580@...kaller.appspotmail.com
> >
> > I could not find this syzbot issue, can you provide a link, and a stack trace ?
>
> This is the syzbot dashboard:
> https://syzkaller.appspot.com/bug?extid=0f526bf9663842ac2dc7
>
>
> and KASAN log:
> ```
> [  239.016529] ==================================================================
> [  239.016540] BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x2b8/0x2ae0
> [  239.016556] Read of size 4 at addr ffffc90000860ba0 by task swapper/14/0
>
> [  239.016571] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G      D     E      6.2.0+ #14
> [  239.016583] Hardware name: Gigabyte Technology Co., Ltd. B460MDS3H/B460M DS3H, BIOS F3 05/27/2020
> [  239.016593] Call Trace:
> [  239.016599]  <IRQ>
> [  239.016605]  dump_stack_lvl+0x4c/0x70
> [  239.016617]  print_report+0xcf/0x620
> [  239.016629]  ? kasan_addr_to_slab+0x11/0xb0
> [  239.016639]  ? xfrm_state_find+0x2b8/0x2ae0
> [  239.016650]  kasan_report+0xbf/0x100
> [  239.016657]  ? xfrm_state_find+0x2b8/0x2ae0
> [  239.016664]  __asan_load4+0x84/0xa0
> [  239.016670]  xfrm_state_find+0x2b8/0x2ae0
> [  239.016677]  ? __pfx_xfrm_state_find+0x10/0x10
> [  239.016684]  ? __pfx_xfrm4_get_saddr+0x10/0x10
> [  239.016690]  ? unwind_next_frame+0x27/0x40
> [  239.016697]  xfrm_tmpl_resolve+0x1f9/0x780
> [  239.016704]  ? __pfx_xfrm_tmpl_resolve+0x10/0x10
> [  239.016711]  ? kasan_save_stack+0x3e/0x50
> [  239.016716]  ? kasan_save_stack+0x2a/0x50
> [  239.016721]  ? kasan_set_track+0x29/0x40
> [  239.016726]  ? kasan_save_alloc_info+0x22/0x30
> [  239.016731]  ? __kasan_slab_alloc+0x91/0xa0
> [  239.016736]  ? kmem_cache_alloc+0x17e/0x370
> [  239.016741]  ? dst_alloc+0x5c/0x230
> [  239.016747]  ? xfrm_pol_bin_cmp+0xc8/0xe0
> [  239.016753]  xfrm_resolve_and_create_bundle+0xf1/0x10d0
> [  239.016758]  ? __pfx_xfrm_policy_inexact_lookup_rcu+0x10/0x10
> [  239.016764]  ? xfrm_policy_lookup_inexact_addr+0xa1/0xc0
> [  239.016771]  ? xfrm_policy_match+0xd6/0x110
> [  239.016776]  ? __rcu_read_unlock+0x3b/0x80
> [  239.016782]  ? xfrm_policy_lookup_bytype.constprop.0+0x52e/0xb80
> [  239.016788]  ? __pfx_xfrm_resolve_and_create_bundle+0x10/0x10
> [  239.016795]  ? __pfx_xfrm_policy_lookup_bytype.constprop.0+0x10/0x10
> [  239.016802]  ? __kasan_check_write+0x18/0x20
> [  239.016807]  ? _raw_spin_lock_bh+0x8c/0xe0
> [  239.016813]  ? __pfx__raw_spin_lock_bh+0x10/0x10
> [  239.016819]  xfrm_lookup_with_ifid+0x2f2/0xe50
> [  239.016824]  ? __local_bh_enable_ip+0x3f/0x90
> [  239.016830]  ? rcu_gp_cleanup+0x2f2/0x6c0
> [  239.016836]  ? __pfx_xfrm_lookup_with_ifid+0x10/0x10
> [  239.016842]  ? ip_route_output_key_hash_rcu+0x3da/0x1000
> [  239.016848]  xfrm_lookup_route+0x2a/0x100
> [  239.016854]  ip_route_output_flow+0x1a7/0x1c0
> [  239.016859]  ? __pfx_ip_route_output_flow+0x10/0x10
> [  239.016866]  igmpv3_newpack+0x1c1/0x5d0
> [  239.016872]  ? __pfx_igmpv3_newpack+0x10/0x10
> [  239.016878]  ? check_preempt_curr+0xd7/0x120
> [  239.016885]  add_grhead+0x111/0x130
> [  239.016891]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  239.016897]  add_grec+0x756/0x7f0
> [  239.016903]  ? __pfx_add_grec+0x10/0x10
> [  239.016909]  ? __pfx__raw_spin_lock_bh+0x10/0x10
> [  239.016914]  ? wake_up_process+0x19/0x20
> [  239.016919]  ? insert_work+0x130/0x160
> [  239.016926]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  239.016932]  igmp_ifc_timer_expire+0x2b5/0x650
> [  239.016938]  ? __kasan_check_write+0x18/0x20
> [  239.016943]  ? _raw_spin_lock+0x8c/0xe0
> [  239.016948]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  239.016954]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  239.016960]  call_timer_fn+0x2d/0x1b0
> [  239.016966]  ? __pfx_igmp_ifc_timer_expire+0x10/0x10
> [  239.016973]  __run_timers.part.0+0x447/0x530
> [  239.016979]  ? __pfx___run_timers.part.0+0x10/0x10
> [  239.016986]  ? ktime_get+0x58/0xd0
> [  239.016992]  ? lapic_next_deadline+0x30/0x40
> [  239.016997]  ? clockevents_program_event+0x118/0x1a0
> [  239.017004]  run_timer_softirq+0x69/0xf0
> [  239.017010]  __do_softirq+0x128/0x444
> [  239.017016]  __irq_exit_rcu+0xdd/0x130
> [  239.017022]  irq_exit_rcu+0x12/0x20
> [  239.017027]  sysvec_apic_timer_interrupt+0xa5/0xc0
> [  239.017033]  </IRQ>
> [  239.017036]  <TASK>
> [  239.017039]  asm_sysvec_apic_timer_interrupt+0x1f/0x30
> [  239.017045] RIP: 0010:cpuidle_enter_state+0x1d2/0x510
> [  239.017051] Code: 30 00 0f 84 61 01 00 00 41 83 ed 01 73 dd 48 83 c4 28 44 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc fb 0f 1f 44 00 00 <45> 85 f6 0f 89 18 ff ff ff 48 c7 43 18 00 00 00 00 49 83 fd 09 0f
> [  239.017061] RSP: 0018:ffffc9000028fd40 EFLAGS: 00000246
> [  239.017067] RAX: 0000000000000000 RBX: ffffe8ffffd00c40 RCX: ffffffff811eae9c
> [  239.017072] RDX: dffffc0000000000 RSI: ffffffff82e7cc00 RDI: ffff88840eb44188
> [  239.017077] RBP: ffffc9000028fd90 R08: 00000037a67e1a91 R09: ffff88840eb3f0a3
> [  239.017082] R10: ffffed1081d67e14 R11: 0000000000000001 R12: ffffffff846c43a0
> [  239.017087] R13: 0000000000000002 R14: 0000000000000002 R15: ffffffff846c4488
> [  239.017093]  ? sched_idle_set_state+0x4c/0x70
> [  239.017101]  cpuidle_enter+0x45/0x70
> [  239.017108]  call_cpuidle+0x44/0x80
> [  239.017113]  do_idle+0x2b4/0x340
> [  239.017118]  ? __pfx_do_idle+0x10/0x10
> [  239.017125]  cpu_startup_entry+0x24/0x30
> [  239.017130]  start_secondary+0x1d6/0x210
> [  239.017136]  ? __pfx_start_secondary+0x10/0x10
> [  239.017142]  ? set_bringup_idt_handler.constprop.0+0x93/0xc0
> [  239.017150]  ? start_cpu0+0xc/0xc
> [  239.017156]  secondary_startup_64_no_verify+0xe5/0xeb
> [  239.017163]  </TASK>
>
> [  239.017169] The buggy address belongs to the virtual mapping at
>                 [ffffc90000859000, ffffc90000862000) created by:
>                 irq_init_percpu_irqstack+0x1c8/0x270
>
> [  239.017182] The buggy address belongs to the physical page:
> [  239.017186] page:00000000d2ae7732 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x40eb09
> [  239.017193] flags: 0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
> [  239.017202] raw: 0017ffffc0001000 ffffea00103ac248 ffffea00103ac248 0000000000000000
> [  239.017208] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [  239.017213] page dumped because: kasan: bad access detected
>
> [  239.017219] Memory state around the buggy address:
> [  239.017222]  ffffc90000860a80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
> [  239.017228]  ffffc90000860b00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
> [  239.017233] >ffffc90000860b80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
> [  239.017238]                                ^
> [  239.017241]  ffffc90000860c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  239.017247]  ffffc90000860c80: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3 00
> [  239.017251] ==================================================================
> ```
>
> >
> > > Signed-off-by: Hyunwoo Kim <v4bel@...ori.io>
> > > ---
> >
> > I find this patch quite strange, to be honest.
> >
> > It looks like some xfrm bug to me.
> >
> > A stack trace would be helpful.
>
> Here are the root caouses I analyzed:
>
> ```
> igmp_ifc_timer_expire() -> igmpv3_send_cr() -> add_grec() -> add_grhead() -> igmpv3_newpack()[1] -> ip_route_output_ports() ->
> ip_route_output_flow()[3] -> xfrm_lookup_route() -> xfrm_lookup() -> xfrm_lookup_with_ifid() -> xfrm_resolve_and_create_bundle() ->
> xfrm_tmpl_resolve() -> xfrm_tmpl_resolve_one() -> xfrm_state_find()[5]
> ```
>
> Here, igmpv3_newpack()[1] declares the variable `struct flowi4 fl4`[2]:
> ```
> static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
> {
>         struct sk_buff *skb;
>         struct rtable *rt;
>         struct iphdr *pip;
>         struct igmpv3_report *pig;
>         struct net *net = dev_net(dev);
>         struct flowi4 fl4;    // <===[2]
>         int hlen = LL_RESERVED_SPACE(dev);
>         int tlen = dev->needed_tailroom;
>         unsigned int size = mtu;
>
>         while (1) {
>                 skb = alloc_skb(size + hlen + tlen,
>                                 GFP_ATOMIC | __GFP_NOWARN);
>                 if (skb)
>                         break;
>                 size >>= 1;
>                 if (size < 256)
>                         return NULL;
>         }
>         skb->priority = TC_PRIO_CONTROL;
>
>         rt = ip_route_output_ports(net, &fl4, NULL, IGMPV3_ALL_MCR, 0,
>                                    0, 0,
>                                    IPPROTO_IGMP, 0, dev->ifindex);
> ```
>

OK, this is IPv4 stack here. No bug I think.


>
> Then, starting with ip_route_output_flow()[3], we use flowi4_to_flowi() to convert the `struct flowi4` variable to a `struct flowi` pointer[4]:
> ```
> struct flowi {
>         union {
>                 struct flowi_common     __fl_common;
>                 struct flowi4           ip4;
>                 struct flowi6           ip6;
>         } u;
> #define flowi_oif       u.__fl_common.flowic_oif
> #define flowi_iif       u.__fl_common.flowic_iif
> #define flowi_l3mdev    u.__fl_common.flowic_l3mdev
> #define flowi_mark      u.__fl_common.flowic_mark
> #define flowi_tos       u.__fl_common.flowic_tos
> #define flowi_scope     u.__fl_common.flowic_scope
> #define flowi_proto     u.__fl_common.flowic_proto
> #define flowi_flags     u.__fl_common.flowic_flags
> #define flowi_secid     u.__fl_common.flowic_secid
> #define flowi_tun_key   u.__fl_common.flowic_tun_key
> #define flowi_uid       u.__fl_common.flowic_uid
> } __attribute__((__aligned__(BITS_PER_LONG/8)));
>
>
> static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
> {
>         return container_of(fl4, struct flowi, u.ip4);
> }
>
>
> struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
>                                     const struct sock *sk)
> {
>         struct rtable *rt = __ip_route_output_key(net, flp4);
>
>         if (IS_ERR(rt))
>                 return rt;
>
>         if (flp4->flowi4_proto) {
>                 flp4->flowi4_oif = rt->dst.dev->ifindex;
>                 rt = (struct rtable *)xfrm_lookup_route(net, &rt->dst,
>                                                         flowi4_to_flowi(flp4),  // <===[4]
>                                                         sk, 0);
>         }
>
>         return rt;
> }
> EXPORT_SYMBOL_GPL(ip_route_output_flow);
> ```
> This is the cause of the stack OOB. Because we calculated the struct flowi pointer address based on struct flowi4 declared as a stack variable,
> if we accessed a member of flowi that exceeds the size of flowi4, we would get an OOB.
>
>
> Finally, xfrm_state_find()[5] uses daddr, which is a pointer to `&fl->u.ip4.saddr`.
> Here, the encap_family variable can be entered by the user using the netlink socket.
> If the user chose AF_INET6 instead of AF_INET, the xfrm_dst_hash() function would be called on an AF_INET6 basis[6],
> which could cause an OOB in the `struct flowi4 fl4` variable of igmpv3_newpack()[2].
> ```

This is probably the real bug. (in XFRM layer)

CC Steffen Klassert <steffen.klassert@...unet.com> for comments.

> struct xfrm_state *
> xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
>                 const struct flowi *fl, struct xfrm_tmpl *tmpl,
>                 struct xfrm_policy *pol, int *err,
>                 unsigned short family, u32 if_id)
> {
>         static xfrm_address_t saddr_wildcard = { };
>         struct net *net = xp_net(pol);
>         unsigned int h, h_wildcard;
>         struct xfrm_state *x, *x0, *to_put;
>         int acquire_in_progress = 0;
>         int error = 0;
>         struct xfrm_state *best = NULL;
>         u32 mark = pol->mark.v & pol->mark.m;
>         unsigned short encap_family = tmpl->encap_family;
>         unsigned int sequence;
>         struct km_event c;
>
>         to_put = NULL;
>
>         sequence = read_seqcount_begin(&net->xfrm.xfrm_state_hash_generation);
>
>         rcu_read_lock();
>         h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family);  // <===[6]
> ```
>
> Of course, it's possible that the patch I wrote is not appropriate.
>
>

Thanks for the report.

> Regards,
> Hyunwoo Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ