netdev - Re: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Apr 2020 22:23:37 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
CC:     "ecree@...arflare.com" <ecree@...arflare.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "gerlitz.or@...il.com" <gerlitz.or@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "sashal@...nel.org" <sashal@...nel.org>,
        "leon@...nel.org" <leon@...nel.org>
Subject: Re: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for
 representors

On Fri, 2020-04-17 at 10:28 +0200, gregkh@...uxfoundation.org wrote:
> On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:
> > On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:
> > > On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:
> > > > On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:
> > > > > On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky
> > > > > wrote:
> > > > > > On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed
> > > > > > wrote:
> > > > > > > On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:
> > > > > > > > On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree
> > > > > > > > wrote:
> > > > > > > > > Firstly, let me apologise: my previous email was too
> > > > > > > > > harsh
> > > > > > > > > and too
> > > > > > > > >  assertiveabout things that were really more
> > > > > > > > > uncertain
> > > > > > > > > and
> > > > > > > > > unclear.
> > > > > > > > > 
> > > > > > > > > On 14/04/2020 21:57, Sasha Levin wrote:
> > > > > > > > > > I've pointed out that almost 50% of commits tagged
> > > > > > > > > > for
> > > > > > > > > > stable do
> > > > > > > > > > not
> > > > > > > > > > have a fixes tag, and yet they are fixes. You
> > > > > > > > > > really
> > > > > > > > > > deduce
> > > > > > > > > > things based
> > > > > > > > > > on coin flip probability?
> > > > > > > > > Yes, but far less than 50% of commits *not* tagged
> > > > > > > > > for
> > > > > > > > > stable
> > > > > > > > > have
> > > > > > > > > a fixes
> > > > > > > > >  tag.  It's not about hard-and-fast Aristotelian
> > > > > > > > > "deductions", like
> > > > > > > > > "this
> > > > > > > > >  doesn't have Fixes:, therefore it is not a stable
> > > > > > > > > candidate", it's
> > > > > > > > > about
> > > > > > > > >  probabilistic "induction".
> > > > > > > > > 
> > > > > > > > > > "it does increase the amount of countervailing
> > > > > > > > > > evidence
> > > > > > > > > > needed to
> > > > > > > > > > conclude a commit is a fix" - Please explain this
> > > > > > > > > > argument
> > > > > > > > > > given
> > > > > > > > > > the
> > > > > > > > > > above.
> > > > > > > > > Are you familiar with Bayesian statistics?  If not,
> > > > > > > > > I'd
> > > > > > > > > suggest
> > > > > > > > > reading
> > > > > > > > >  something like http://yudkowsky.net/rational/bayes/
> > > > > > > > > which
> > > > > > > > > explains
> > > > > > > > > it.
> > > > > > > > > There's a big difference between a coin flip and a
> > > > > > > > > _correlated_
> > > > > > > > > coin flip.
> > > > > > > > 
> > > > > > > > I'd maybe point out that the selection process is based
> > > > > > > > on
> > > > > > > > a
> > > > > > > > neural
> > > > > > > > network which knows about the existence of a Fixes tag
> > > > > > > > in a
> > > > > > > > commit.
> > > > > > > > 
> > > > > > > > It does exactly what you're describing, but also taking
> > > > > > > > a
> > > > > > > > bunch
> > > > > > > > more
> > > > > > > > factors into it's desicion process ("panic"? "oops"?
> > > > > > > > "overflow"?
> > > > > > > > etc).
> > > > > > > > 
> > > > > > > 
> > > > > > > I am not against AUTOSEL in general, as long as the
> > > > > > > decision
> > > > > > > to
> > > > > > > know
> > > > > > > how far back it is allowed to take a patch is made
> > > > > > > deterministically
> > > > > > > and not statistically based on some AI hunch.
> > > > > > > 
> > > > > > > Any auto selection for a patch without a Fixes tags can
> > > > > > > be
> > > > > > > catastrophic
> > > > > > > .. imagine a patch without a Fixes Tag with a single line
> > > > > > > that is
> > > > > > > fixing some "oops", such patch can be easily applied
> > > > > > > cleanly
> > > > > > > to
> > > > > > > stable-
> > > > > > > v.x and stable-v.y .. while it fixes the issue on v.x it
> > > > > > > might
> > > > > > > have
> > > > > > > catastrophic results on v.y ..
> > > > > > 
> > > > > > I tried to imagine such flow and failed to do so. Are you
> > > > > > talking
> > > > > > about
> > > > > > anything specific or imaginary case?
> > > > > 
> > > > > It happens, rarely, but it does. However, all the cases I can
> > > > > think
> > > > > of
> > > > > happened with a stable tagged commit without a fixes where
> > > > > it's
> > > > > backport
> > > > > to an older tree caused unintended behavior (local denial of
> > > > > service
> > > > > in
> > > > > one case).
> > > > > 
> > > > > The scenario you have in mind is true for both stable and
> > > > > non-
> > > > > stable
> > > > > tagged patches, so it you want to restrict how we deal with
> > > > > commits
> > > > > that
> > > > > don't have a fixes tag shouldn't it be true for *all*
> > > > > commits?
> > > > 
> > > > All commits? even the ones without "oops" in them ? where does
> > > > this
> > > > stop ? :)
> > > > We _must_ have a hard and deterministic cut for how far back to
> > > > take a
> > > > patch based on a human decision.. unless we are 100% positive
> > > > autoselection AI can never make a mistake.
> > > > 
> > > > Humans are allowed to make mistakes, AI is not.
> > > 
> > > Oh I'm reviewing all patches myself after the bot does it's
> > > selection,
> > > you can blame me for these screw ups.
> > > 
> > > > If a Fixes tag is wrong, then a human will be blamed, and that
> > > > is
> > > > perfectly fine, but if we have some statistical model that we
> > > > know
> > > > it
> > > > is going to be wrong 0.001% of the time.. and we still let it
> > > > run..
> > > > then something needs to be done about this.
> > > > 
> > > > I know there are benefits to autosel, but overtime, if this is
> > > > not
> > > > being audited, many pieces of the kernel will get broken
> > > > unnoticed
> > > > until some poor distro decides to upgrade their kernel version.
> > > 
> > > Quite a few distros are always running on the latest LTS
> > > releases,
> > > Android isn't that far behind either at this point.
> > > 
> > > There are actually very few non-LTS users at this point...
> > > 
> > > > > > <...>
> > > > > > > > Let me put my Microsoft employee hat on here. We have
> > > > > > > > driver/net/hyperv/
> > > > > > > > which definitely wasn't getting all the fixes it should
> > > > > > > > have
> > > > > > > > been
> > > > > > > > getting without AUTOSEL.
> > > > > > > > 
> > > > > > > 
> > > > > > > until some patch which shouldn't get backported slips
> > > > > > > through,
> > > > > > > believe
> > > > > > > me this will happen, just give it some time ..
> > > > > > 
> > > > > > Bugs are inevitable, I don't see many differences between
> > > > > > bugs
> > > > > > introduced by manually cherry-picking or automatically one.
> > > > > 
> > > > > Oh bugs slip in, that's why I track how many bugs slipped via
> > > > > stable
> > > > > tagged commits vs non-stable tagged ones, and the statistic
> > > > > may
> > > > > surprise
> > > > > you.
> > > > > 
> > > > 
> > > > Statistics do not matter here, what really matters is that
> > > > there is
> > > > a
> > > > possibility of a non-human induced error, this should be a no
> > > > no.
> > > > or at least make it an opt-in thing for those who want to take
> > > > their
> > > > chances and keep a close eye on it..
> > > 
> > > Hrm, why? Pretend that the bot is a human sitting somewhere
> > > sending
> > > mails out, how does it change anything?
> > > 
> > 
> > If i know a bot might do something wrong, i Fix it and make sure it
> > will never do it again. For humans i just can't do that, can I ? :)
> > so this is the difference and why we all have jobs .. 
> > 
> > > > > The solution here is to beef up your testing infrastructure
> > > > > rather
> > > > > than
> > > > 
> > > > So please let me opt-in until I beef up my testing infra.
> > > 
> > > Already did :)
> > 
> > No you didn't :), I received more than 5 AUTOSEL emails only today
> > and
> > yesterday.
> > 
> > Please don't opt mlx5 out just yet ;-), i need to do some more
> > research
> > and make up my mind..
> > 
> > > > > taking less patches; we still want to have *all* the fixes,
> > > > > right?
> > > > > 
> > > > 
> > > > if you can be sure 100% it is the right thing to do, then yes,
> > > > please
> > > > don't hesitate to take that patch, even without asking anyone
> > > > !!
> > > > 
> > > > Again, Humans are allowed to make mistakes.. AI is not.
> > > 
> > > Again, why?
> > > 
> > 
> > Because AI is not there yet.. and this is a very big philosophical
> > question.
> > 
> > Let me simplify: there is a bug in the AI, where it can choose a
> > wrong
> > patch, let's fix it.
> 
> You do realize that there are at least 2 steps in this "AI" where
> people
> are involved.  The first is when Sasha goes thorough the patches and
> weeds out all of the "bad ones".
> 
> The second is when you, the maintainer, is asked if you think there
> is a
> problem if the patch is to be merged.
> 
> Then there's also the third, when again, I send out emails for the
> -rc
> process with the patches involved, and you are cc:ed on it.
> 
> This isn't an unchecked process here running with no human checks at
> all
> in it, so please don't speak of it like it is.
> 

Sure I understand,

But with all do respect to Sasha and i know he is doing a great job, he
just can't sign-off on all of the patches on all of the linux kernel
and determine just by himself if a patch is good or not.. and the
maintainer review is what actually matters here.

But the maintainer ack is an optional thing, and I bet that the vast
majority don't even look at these e-mails.

My vision is that we make this an opt-in thing, and we somehow force
all active and important kernel subsystems to opt-in, and make it the
maintainer responsibility if something goes wrong. 

I understand from your statistics that this system is working very
well, so i believe eventually every maintainer with a code that matters
will come on board.

this way we don't risk it for inactive and less important
subsystems/drivers.. and we guarantee the whole thing is properly
audited with the maintainers on-board..