netdev - Re: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for representors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200417082804.GB140064@kroah.com>
Date:   Fri, 17 Apr 2020 10:28:04 +0200
From:   "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     "sashal@...nel.org" <sashal@...nel.org>,
        "ecree@...arflare.com" <ecree@...arflare.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "gerlitz.or@...il.com" <gerlitz.or@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "leon@...nel.org" <leon@...nel.org>
Subject: Re: [PATCH AUTOSEL 4.9 09/26] net/mlx5e: Init ethtool steering for
 representors

On Thu, Apr 16, 2020 at 09:08:06PM +0000, Saeed Mahameed wrote:
> On Thu, 2020-04-16 at 15:58 -0400, Sasha Levin wrote:
> > On Thu, Apr 16, 2020 at 07:07:13PM +0000, Saeed Mahameed wrote:
> > > On Thu, 2020-04-16 at 09:30 -0400, Sasha Levin wrote:
> > > > On Thu, Apr 16, 2020 at 08:24:09AM +0300, Leon Romanovsky wrote:
> > > > > On Thu, Apr 16, 2020 at 04:08:10AM +0000, Saeed Mahameed wrote:
> > > > > > On Wed, 2020-04-15 at 20:00 -0400, Sasha Levin wrote:
> > > > > > > On Wed, Apr 15, 2020 at 05:18:38PM +0100, Edward Cree
> > > > > > > wrote:
> > > > > > > > Firstly, let me apologise: my previous email was too
> > > > > > > > harsh
> > > > > > > > and too
> > > > > > > >  assertiveabout things that were really more uncertain
> > > > > > > > and
> > > > > > > > unclear.
> > > > > > > > 
> > > > > > > > On 14/04/2020 21:57, Sasha Levin wrote:
> > > > > > > > > I've pointed out that almost 50% of commits tagged for
> > > > > > > > > stable do
> > > > > > > > > not
> > > > > > > > > have a fixes tag, and yet they are fixes. You really
> > > > > > > > > deduce
> > > > > > > > > things based
> > > > > > > > > on coin flip probability?
> > > > > > > > Yes, but far less than 50% of commits *not* tagged for
> > > > > > > > stable
> > > > > > > > have
> > > > > > > > a fixes
> > > > > > > >  tag.  It's not about hard-and-fast Aristotelian
> > > > > > > > "deductions", like
> > > > > > > > "this
> > > > > > > >  doesn't have Fixes:, therefore it is not a stable
> > > > > > > > candidate", it's
> > > > > > > > about
> > > > > > > >  probabilistic "induction".
> > > > > > > > 
> > > > > > > > > "it does increase the amount of countervailing evidence
> > > > > > > > > needed to
> > > > > > > > > conclude a commit is a fix" - Please explain this
> > > > > > > > > argument
> > > > > > > > > given
> > > > > > > > > the
> > > > > > > > > above.
> > > > > > > > Are you familiar with Bayesian statistics?  If not, I'd
> > > > > > > > suggest
> > > > > > > > reading
> > > > > > > >  something like http://yudkowsky.net/rational/bayes/
> > > > > > > > which
> > > > > > > > explains
> > > > > > > > it.
> > > > > > > > There's a big difference between a coin flip and a
> > > > > > > > _correlated_
> > > > > > > > coin flip.
> > > > > > > 
> > > > > > > I'd maybe point out that the selection process is based on
> > > > > > > a
> > > > > > > neural
> > > > > > > network which knows about the existence of a Fixes tag in a
> > > > > > > commit.
> > > > > > > 
> > > > > > > It does exactly what you're describing, but also taking a
> > > > > > > bunch
> > > > > > > more
> > > > > > > factors into it's desicion process ("panic"? "oops"?
> > > > > > > "overflow"?
> > > > > > > etc).
> > > > > > > 
> > > > > > 
> > > > > > I am not against AUTOSEL in general, as long as the decision
> > > > > > to
> > > > > > know
> > > > > > how far back it is allowed to take a patch is made
> > > > > > deterministically
> > > > > > and not statistically based on some AI hunch.
> > > > > > 
> > > > > > Any auto selection for a patch without a Fixes tags can be
> > > > > > catastrophic
> > > > > > .. imagine a patch without a Fixes Tag with a single line
> > > > > > that is
> > > > > > fixing some "oops", such patch can be easily applied cleanly
> > > > > > to
> > > > > > stable-
> > > > > > v.x and stable-v.y .. while it fixes the issue on v.x it
> > > > > > might
> > > > > > have
> > > > > > catastrophic results on v.y ..
> > > > > 
> > > > > I tried to imagine such flow and failed to do so. Are you
> > > > > talking
> > > > > about
> > > > > anything specific or imaginary case?
> > > > 
> > > > It happens, rarely, but it does. However, all the cases I can
> > > > think
> > > > of
> > > > happened with a stable tagged commit without a fixes where it's
> > > > backport
> > > > to an older tree caused unintended behavior (local denial of
> > > > service
> > > > in
> > > > one case).
> > > > 
> > > > The scenario you have in mind is true for both stable and non-
> > > > stable
> > > > tagged patches, so it you want to restrict how we deal with
> > > > commits
> > > > that
> > > > don't have a fixes tag shouldn't it be true for *all* commits?
> > > 
> > > All commits? even the ones without "oops" in them ? where does this
> > > stop ? :)
> > > We _must_ have a hard and deterministic cut for how far back to
> > > take a
> > > patch based on a human decision.. unless we are 100% positive
> > > autoselection AI can never make a mistake.
> > > 
> > > Humans are allowed to make mistakes, AI is not.
> > 
> > Oh I'm reviewing all patches myself after the bot does it's
> > selection,
> > you can blame me for these screw ups.
> > 
> > > If a Fixes tag is wrong, then a human will be blamed, and that is
> > > perfectly fine, but if we have some statistical model that we know
> > > it
> > > is going to be wrong 0.001% of the time.. and we still let it run..
> > > then something needs to be done about this.
> > > 
> > > I know there are benefits to autosel, but overtime, if this is not
> > > being audited, many pieces of the kernel will get broken unnoticed
> > > until some poor distro decides to upgrade their kernel version.
> > 
> > Quite a few distros are always running on the latest LTS releases,
> > Android isn't that far behind either at this point.
> > 
> > There are actually very few non-LTS users at this point...
> > 
> > > > > <...>
> > > > > > > Let me put my Microsoft employee hat on here. We have
> > > > > > > driver/net/hyperv/
> > > > > > > which definitely wasn't getting all the fixes it should
> > > > > > > have
> > > > > > > been
> > > > > > > getting without AUTOSEL.
> > > > > > > 
> > > > > > 
> > > > > > until some patch which shouldn't get backported slips
> > > > > > through,
> > > > > > believe
> > > > > > me this will happen, just give it some time ..
> > > > > 
> > > > > Bugs are inevitable, I don't see many differences between bugs
> > > > > introduced by manually cherry-picking or automatically one.
> > > > 
> > > > Oh bugs slip in, that's why I track how many bugs slipped via
> > > > stable
> > > > tagged commits vs non-stable tagged ones, and the statistic may
> > > > surprise
> > > > you.
> > > > 
> > > 
> > > Statistics do not matter here, what really matters is that there is
> > > a
> > > possibility of a non-human induced error, this should be a no no.
> > > or at least make it an opt-in thing for those who want to take
> > > their
> > > chances and keep a close eye on it..
> > 
> > Hrm, why? Pretend that the bot is a human sitting somewhere sending
> > mails out, how does it change anything?
> > 
> 
> If i know a bot might do something wrong, i Fix it and make sure it
> will never do it again. For humans i just can't do that, can I ? :)
> so this is the difference and why we all have jobs .. 
> 
> > > > The solution here is to beef up your testing infrastructure
> > > > rather
> > > > than
> > > 
> > > So please let me opt-in until I beef up my testing infra.
> > 
> > Already did :)
> 
> No you didn't :), I received more than 5 AUTOSEL emails only today and
> yesterday.
> 
> Please don't opt mlx5 out just yet ;-), i need to do some more research
> and make up my mind..
> 
> > 
> > > > taking less patches; we still want to have *all* the fixes,
> > > > right?
> > > > 
> > > 
> > > if you can be sure 100% it is the right thing to do, then yes,
> > > please
> > > don't hesitate to take that patch, even without asking anyone !!
> > > 
> > > Again, Humans are allowed to make mistakes.. AI is not.
> > 
> > Again, why?
> > 
> 
> Because AI is not there yet.. and this is a very big philosophical
> question.
> 
> Let me simplify: there is a bug in the AI, where it can choose a wrong
> patch, let's fix it.

You do realize that there are at least 2 steps in this "AI" where people
are involved.  The first is when Sasha goes thorough the patches and
weeds out all of the "bad ones".

The second is when you, the maintainer, is asked if you think there is a
problem if the patch is to be merged.

Then there's also the third, when again, I send out emails for the -rc
process with the patches involved, and you are cc:ed on it.

This isn't an unchecked process here running with no human checks at all
in it, so please don't speak of it like it is.

thanks,

greg k-h