[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+G3_0QzdOsoCMOk-Qgd+Vv7hKEtogMNJ00pUGC1wX=ow@mail.gmail.com>
Date: Mon, 2 Dec 2024 10:17:10 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Pablo Neira Ayuso <pablo@...filter.org>
Cc: netfilter-devel@...r.kernel.org, davem@...emloft.net,
netdev@...r.kernel.org, kuba@...nel.org, pabeni@...hat.com, fw@...len.de
Subject: Re: [PATCH net 4/4] netfilter: nft_inner: incorrect percpu area
handling under softirq
On Mon, Dec 2, 2024 at 2:24 AM Pablo Neira Ayuso <pablo@...filter.org> wrote:
>
> Hi Eric,
>
> On Fri, Nov 29, 2024 at 10:14:34AM +0100, Eric Dumazet wrote:
> > On Thu, Nov 28, 2024 at 1:38 PM Pablo Neira Ayuso <pablo@...filter.org> wrote:
> > >
> > > Softirq can interrupt packet from process context which walks over the
> > > percpu area.
> > >
> > > Add routines to disable bh while restoring and saving the tunnel parser
> > > context from percpu area to stack. Add a skbuff owner for this percpu
> > > area to catch softirq interference to exercise the packet tunnel parser
> > > again in such case.
> > >
> > > Reported-by: syzbot+84d0441b9860f0d63285@...kaller.appspotmail.com
> > > Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
> > > Signed-off-by: Pablo Neira Ayuso <pablo@...filter.org>
> > > ---
> > > include/net/netfilter/nf_tables_core.h | 1 +
> > > net/netfilter/nft_inner.c | 56 ++++++++++++++++++++------
> > > 2 files changed, 45 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
> > > index ff27cb2e1662..dae0e7592934 100644
> > > --- a/include/net/netfilter/nf_tables_core.h
> > > +++ b/include/net/netfilter/nf_tables_core.h
> > > @@ -161,6 +161,7 @@ enum {
> > > };
> > >
> > > struct nft_inner_tun_ctx {
> > > + struct sk_buff *skb; /* percpu area owner */
> > > u16 type;
> > > u16 inner_tunoff;
> > > u16 inner_lloff;
> > > diff --git a/net/netfilter/nft_inner.c b/net/netfilter/nft_inner.c
> > > index 928312d01eb1..fcaa126ac8da 100644
> > > --- a/net/netfilter/nft_inner.c
> > > +++ b/net/netfilter/nft_inner.c
> > > @@ -210,35 +210,65 @@ static int nft_inner_parse(const struct nft_inner *priv,
> > > struct nft_pktinfo *pkt,
> > > struct nft_inner_tun_ctx *tun_ctx)
> > > {
> > > - struct nft_inner_tun_ctx ctx = {};
> > > u32 off = pkt->inneroff;
> > >
> > > if (priv->flags & NFT_INNER_HDRSIZE &&
> > > - nft_inner_parse_tunhdr(priv, pkt, &ctx, &off) < 0)
> > > + nft_inner_parse_tunhdr(priv, pkt, tun_ctx, &off) < 0)
> > > return -1;
> > >
> > > if (priv->flags & (NFT_INNER_LL | NFT_INNER_NH)) {
> > > - if (nft_inner_parse_l2l3(priv, pkt, &ctx, off) < 0)
> > > + if (nft_inner_parse_l2l3(priv, pkt, tun_ctx, off) < 0)
> > > return -1;
> > > } else if (priv->flags & NFT_INNER_TH) {
> > > - ctx.inner_thoff = off;
> > > - ctx.flags |= NFT_PAYLOAD_CTX_INNER_TH;
> > > + tun_ctx->inner_thoff = off;
> > > + tun_ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH;
> > > }
> > >
> > > - *tun_ctx = ctx;
> > > tun_ctx->type = priv->type;
> > > + tun_ctx->skb = pkt->skb;
> > > pkt->flags |= NFT_PKTINFO_INNER_FULL;
> > >
> > > return 0;
> > > }
> > >
> > > +static bool nft_inner_restore_tun_ctx(const struct nft_pktinfo *pkt,
> > > + struct nft_inner_tun_ctx *tun_ctx)
> > > +{
> > > + struct nft_inner_tun_ctx *this_cpu_tun_ctx;
> > > +
> > > + local_bh_disable();
> > > + this_cpu_tun_ctx = this_cpu_ptr(&nft_pcpu_tun_ctx);
> > > + if (this_cpu_tun_ctx->skb != pkt->skb) {
> >
> > I must say I do not understand this patch.
> >
> > If a context is used by a save/restore more than one time per packet
> > traversal, then this means we can not use per-cpu storage,
> > or risk flakes.
> >
> > Also, skb could be freed and re-allocated ?
> >
> > Perhaps describe a bit more what is going on in the changelog.
>
> There is an on-stack nft_pktinfo structure with a flags field. This
> nft_pktinfo is a wrapper for the sk_buff, containing header offsets
> and metainformation. This is initialize when entering this chain.
>
> If the NFT_PKTINFO_INNER_FULL flag is set on, then the percpu area
> could contain information on the inner header offsets (ie. packet was
> already parsed in this chain).
>
> There is a check to validate that the percpu area refers to this
> skbuff. If there is a mismatch, then this needs to parse the inner
> headers because the data in the percpu area is stale. Otherwise, if
> there is a match, the percpu area is copied on-stack.
>
> After processing (payload/meta fetching), the on-stack copy is stored
> back to the percpu area. I can make an improvement on this patch to
> check if this skbuff still owns the percpu area in the store/exit
> section of this inner evaluation routine, to avoid a unnecessary update.
>
> So, it is basically the NFT_PKTINFO_INNER_FULL flag that handles the
> skbuff reallocation scenario. This is not blindly trusting this percpu
> area per-se.
>
> One comestic change I can apply to this: I can also turn the struct
> sk_buff into unsigned long so it clear to the reader this pointer to
> skbuff is not meant to be used for being dereferenced.
>
> If the explaination above is sufficient, I can revamp to extend the
> changelog as you suggest and post a new version of this patch.
>
> Thanks.
The part I do not understand is that tun_ctx->skb is not cleared at
the end of processing (or at some point)
That would make clear that a future (tun_ctx->skb == skb) test is not
confused by skb reuse (free/alloc).
If you use it as a cookie, then we need something else than a pointer.
Powered by blists - more mailing lists