[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKErNvo1h1Lm_g=-V_=rCjzh1B-L_yqoBQo3wp73pVjSM8x96Q@mail.gmail.com>
Date: Sun, 20 Nov 2022 23:39:08 +0200
From: Максим <maxtram95@...il.com>
To: Gal Pressman <gal@...dia.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Maxim Mikityanskiy <maximmi@...dia.com>,
"David S . Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Cong Wang <xiyou.wangcong@...il.com>, eric.dumazet@...il.com,
syzbot <syzkaller@...glegroups.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Tariq Toukan <tariqt@...dia.com>
Subject: Re: [PATCH net] net: sched: fix race condition in qdisc_graft()
On Sun, 20 Nov 2022 at 19:01, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sun, Nov 20, 2022 at 8:43 AM Gal Pressman <gal@...dia.com> wrote:
> >
> > On 20/11/2022 18:09, Eric Dumazet wrote:
> > > On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@...dia.com> wrote:
> > >> On 10/11/2022 11:08, Gal Pressman wrote:
> > >>> On 06/11/2022 10:07, Gal Pressman wrote:
> > >>>> It reproduces consistently:
> > >>>> ip link set dev eth2 up
> > >>>> ip addr add 194.237.173.123/16 dev eth2
> > >>>> tc qdisc add dev eth2 clsact
> > >>>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
> > >>>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
> > >>>> 22500.0mbit burst 450000kbit cburst 450000kbit
> > >>>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
> > >>>> 89900kbit cburst 89900kbit
> > >>>> tc qdisc delete dev eth2 clsact
> > >>>> tc qdisc delete dev eth2 root handle 1: htb default 1
> > >>>>
> > >>>> Please let me know if there's anything else you want me to check.
> > >>> Hi Eric, did you get a chance to take a look?
> > >> No response for quite a long time, Jakub, should I submit a revert?
> > > Sorry, I won't have time to look at this before maybe two weeks.
> >
> > Thanks for the response, Eric.
> >
> > > If you want to revert a patch which is correct, because some code
> > > assumes something wrong,
> >
> > I am not convinced about the "code assumes something wrong" part, and
> > not sure what are the consequences of this WARN being triggered, are you?
> >
> > > I will simply say this seems not good.
> >
> > Arguable, it is not that clear that a fix that introduces another issue
> > is a good thing, particularly when we don't understand the severity of
> > the thing that got broken.
>
> The offload part has been put while assuming a certain (clearly wrong) behavior.
>
> RCU rules are quite the first thing we need to respect in the kernel.
>
> Simply put, when KASAN detects a bug, you can be pretty damn sure it
> is a real one.
>
> >
> > Two weeks gets us to the end of -rc7, a bit too dangerous to my personal
> > taste, but I'm not the one making the calls.
>
> Agreed, please try to find someone at nvidia able to understand what Maxim
> was doing in commit ca49bfd90a9dde175d2929dc1544b54841e33804
The check for TCQ_F_BUILTIN basically means checking for noop_qdisc.
As the comment above the WARN_ON says, my code expects that before the
root HTB qdisc is destroyed (the notify_and_destroy line from Eric's
patch, in qdisc_graft), the kernel assigns &noop_qdisc to all of
dev_queue->qdisc-s (i.e. qdiscs of HTB classes). IIRC, it should
happen in dev_deactivate in qdisc_graft (and if IFF_UP is unset at
this point, that means dev_deactivate has been called before, from
__dev_close_many). That said, I don't see how Eric's patch could
affect this logic. What has changed in Eric's patch is that the old
root qdisc (HTB) is now destroyed after assigning the new qdisc to
dev->qdisc, but it has nothing to do with dev_queue->qdisc-s.
Gal, are you sure the WARN_ON only happens after Eric's patch? If yes,
could you do some tracing and find out whether all the qdiscs of the
netdev queues are noop_qdisc after the dev_deactivate line in
qdisc_graft — before and after Eric's patch?
for (i = 0; i < num_q; i++) {
struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
if (dev_queue->qdisc == &noop_qdisc)
...
}
> If something needs stronger rules than standard RCU ones, this should
> be articulated.
>
> As I said, I won't be able to work on this before ~2 weeks.
Powered by blists - more mailing lists