lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Jan 2023 10:46:34 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc:     brouer@...hat.com, netdev@...r.kernel.org,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>, pabeni@...hat.com,
        syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
Subject: Re: [PATCH net-next] net: fix kfree_skb_list use of skb_mark_not_on_list

On Fri, Jan 20, 2023 at 10:30 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
> On 20/01/2023 10.09, Jesper Dangaard Brouer wrote:
> >
> > On 19/01/2023 19.04, Eric Dumazet wrote:
> >> On Thu, Jan 19, 2023 at 6:50 PM Jesper Dangaard Brouer
> >> <brouer@...hat.com> wrote:
> >>>
> >>> A bug was introduced by commit eedade12f4cb ("net: kfree_skb_list use
> >>> kmem_cache_free_bulk"). It unconditionally unlinked the SKB list via
> >>> invoking skb_mark_not_on_list().
> >>>
> >>> The skb_mark_not_on_list() should only be called if __kfree_skb_reason()
> >>> returns true, meaning the SKB is ready to be free'ed, as it calls/check
> >>> skb_unref().
> >>>
> >>> This is needed as kfree_skb_list() is also invoked on skb_shared_info
> >>> frag_list. A frag_list can have SKBs with elevated refcnt due to cloning
> >>> via skb_clone_fraglist(), which takes a reference on all SKBs in the
> >>> list. This implies the invariant that all SKBs in the list must have the
> >>> same refcnt, when using kfree_skb_list().
> >>
> >> Yeah, or more precisely skb_drop_fraglist() calling kfree_skb_list()
> >>
> >>>
> >>> Reported-by: syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
> >>> Reported-and-tested-by:
> >>> syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
> >>> Fixes: eedade12f4cb ("net: kfree_skb_list use kmem_cache_free_bulk")
> >>> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> >>> ---
> >>>   net/core/skbuff.c |    6 +++---
> >>>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>> index 4e73ab3482b8..1bffbcbe6087 100644
> >>> --- a/net/core/skbuff.c
> >>> +++ b/net/core/skbuff.c
> >>> @@ -999,10 +999,10 @@ kfree_skb_list_reason(struct sk_buff *segs,
> >>> enum skb_drop_reason reason)
> >>>          while (segs) {
> >>>                  struct sk_buff *next = segs->next;
> >>>
> >>> -               skb_mark_not_on_list(segs);
> >>> -
> >>> -               if (__kfree_skb_reason(segs, reason))
> >>> +               if (__kfree_skb_reason(segs, reason)) {
> >>> +                       skb_mark_not_on_list(segs);
> >>
> >> Real question is : Why do we need to set/change/dirt skb->next ?
> >>
> >> I would remove this completely, and save extra cache lines dirtying.
> >
> > First of all, we just read this cacheline via reading segs->next.
> > This cacheline must as minimum be in Shared (S) state.
> >
> > Secondly SLUB will write into this cacheline. Thus, we actually know
> > that this cacheline need to go into Modified (M) or Exclusive (E).
> > Thus, writing into it here should be okay.  We could replace it with a
> > prefetchw() to help SLUB get Exclusive (E) cache coherency state.
>
> I looked it up and SLUB no-longer uses the first cacheline of objects to
> store it's freelist_ptr.  Since April 2020 (v5.7) in commit 3202fa62fb43
> ("slub: relocate freelist pointer to middle of object") (Author: Kees
> Cook) the freelist is moved to the middle for security concerns.
> Thus, my prefetchw idea is wrong (details: there is an internal
> prefetch_freepointer that finds the right location).
>
> Also it could make sense to save the potential (S) to (E) cache
> coherency state transition, as SLUB actually writes into another
> cacheline that I first though.

Anyway, we should not assume anything about slub/slab ways of storing
freelists. They might switch to something less expensive, especially considering
bulk API passes an array of pointers, not a 'list'.


>
>
> >> Before your patch, we were not calling skb_mark_not_on_list(segs),
> >> so why bother ?
> >
> > To catch potential bugs.
>
> For this purpose we could discuss creating skb_poison_list() as you
> hinted in your debugging proposal, this would likely have caused us to
> catch this bug faster (via crash on second caller).
>
> Let me know if you prefer that we simply remove skb_mark_not_on_list() ?

Yes. Setting NULL here might hide another bug somewhere.

skb_poison_list() could be defined for CONFIG_DEBUG_NET=y builds I guess.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ