[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+wzgAz8Y9Ce4rw6DkcExUW37-UKKn4eL4-umWsAJ_BKQ@mail.gmail.com>
Date: Fri, 20 Jan 2023 10:46:34 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: brouer@...hat.com, netdev@...r.kernel.org,
Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>, pabeni@...hat.com,
syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
Subject: Re: [PATCH net-next] net: fix kfree_skb_list use of skb_mark_not_on_list
On Fri, Jan 20, 2023 at 10:30 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
> On 20/01/2023 10.09, Jesper Dangaard Brouer wrote:
> >
> > On 19/01/2023 19.04, Eric Dumazet wrote:
> >> On Thu, Jan 19, 2023 at 6:50 PM Jesper Dangaard Brouer
> >> <brouer@...hat.com> wrote:
> >>>
> >>> A bug was introduced by commit eedade12f4cb ("net: kfree_skb_list use
> >>> kmem_cache_free_bulk"). It unconditionally unlinked the SKB list via
> >>> invoking skb_mark_not_on_list().
> >>>
> >>> The skb_mark_not_on_list() should only be called if __kfree_skb_reason()
> >>> returns true, meaning the SKB is ready to be free'ed, as it calls/check
> >>> skb_unref().
> >>>
> >>> This is needed as kfree_skb_list() is also invoked on skb_shared_info
> >>> frag_list. A frag_list can have SKBs with elevated refcnt due to cloning
> >>> via skb_clone_fraglist(), which takes a reference on all SKBs in the
> >>> list. This implies the invariant that all SKBs in the list must have the
> >>> same refcnt, when using kfree_skb_list().
> >>
> >> Yeah, or more precisely skb_drop_fraglist() calling kfree_skb_list()
> >>
> >>>
> >>> Reported-by: syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
> >>> Reported-and-tested-by:
> >>> syzbot+c8a2e66e37eee553c4fd@...kaller.appspotmail.com
> >>> Fixes: eedade12f4cb ("net: kfree_skb_list use kmem_cache_free_bulk")
> >>> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> >>> ---
> >>> net/core/skbuff.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>> index 4e73ab3482b8..1bffbcbe6087 100644
> >>> --- a/net/core/skbuff.c
> >>> +++ b/net/core/skbuff.c
> >>> @@ -999,10 +999,10 @@ kfree_skb_list_reason(struct sk_buff *segs,
> >>> enum skb_drop_reason reason)
> >>> while (segs) {
> >>> struct sk_buff *next = segs->next;
> >>>
> >>> - skb_mark_not_on_list(segs);
> >>> -
> >>> - if (__kfree_skb_reason(segs, reason))
> >>> + if (__kfree_skb_reason(segs, reason)) {
> >>> + skb_mark_not_on_list(segs);
> >>
> >> Real question is : Why do we need to set/change/dirt skb->next ?
> >>
> >> I would remove this completely, and save extra cache lines dirtying.
> >
> > First of all, we just read this cacheline via reading segs->next.
> > This cacheline must as minimum be in Shared (S) state.
> >
> > Secondly SLUB will write into this cacheline. Thus, we actually know
> > that this cacheline need to go into Modified (M) or Exclusive (E).
> > Thus, writing into it here should be okay. We could replace it with a
> > prefetchw() to help SLUB get Exclusive (E) cache coherency state.
>
> I looked it up and SLUB no-longer uses the first cacheline of objects to
> store it's freelist_ptr. Since April 2020 (v5.7) in commit 3202fa62fb43
> ("slub: relocate freelist pointer to middle of object") (Author: Kees
> Cook) the freelist is moved to the middle for security concerns.
> Thus, my prefetchw idea is wrong (details: there is an internal
> prefetch_freepointer that finds the right location).
>
> Also it could make sense to save the potential (S) to (E) cache
> coherency state transition, as SLUB actually writes into another
> cacheline that I first though.
Anyway, we should not assume anything about slub/slab ways of storing
freelists. They might switch to something less expensive, especially considering
bulk API passes an array of pointers, not a 'list'.
>
>
> >> Before your patch, we were not calling skb_mark_not_on_list(segs),
> >> so why bother ?
> >
> > To catch potential bugs.
>
> For this purpose we could discuss creating skb_poison_list() as you
> hinted in your debugging proposal, this would likely have caused us to
> catch this bug faster (via crash on second caller).
>
> Let me know if you prefer that we simply remove skb_mark_not_on_list() ?
Yes. Setting NULL here might hide another bug somewhere.
skb_poison_list() could be defined for CONFIG_DEBUG_NET=y builds I guess.
Powered by blists - more mailing lists