lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVsxqsEGkV0A5lvO@shredder>
Date:   Mon, 4 Oct 2021 19:54:02 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Ido Schimmel <idosch@...dia.com>,
        Ingo Molnar <mingo@...hat.com>, Jiri Pirko <jiri@...dia.com>,
        linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
        mlxsw@...dia.com, Moshe Shemesh <moshe@...dia.com>,
        netdev@...r.kernel.org, Saeed Mahameed <saeedm@...dia.com>,
        Salil Mehta <salil.mehta@...wei.com>,
        Shay Drory <shayd@...dia.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tariq Toukan <tariqt@...dia.com>,
        Yisen Zhuang <yisen.zhuang@...wei.com>
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable
 interface

On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@...dia.com>
> > > 
> > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > we ensure that properly supported devlink ops are not accessible before
> > > devlink_register, which is last command in the initialization sequence.
> > > 
> > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > safely deleted.
> > > 
> > > Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> > 
> > [...]
> > 
> > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > index cb6645012a30..09e48fb232a9 100644
> > > --- a/drivers/net/netdevsim/dev.c
> > > +++ b/drivers/net/netdevsim/dev.c
> > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > >  
> > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > >  	devlink_register(devlink);
> > > -	devlink_reload_enable(devlink);
> > >  	return 0;
> > >  
> > >  err_psample_exit:
> > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > >  
> > > -	devlink_reload_disable(devlink);
> > >  	devlink_unregister(devlink);
> > > -
> > >  	nsim_dev_reload_destroy(nsim_dev);
> > >  
> > >  	nsim_bpf_dev_exit(nsim_dev);
> > 
> > I didn't remember why devlink_reload_{enable,disable}() were added in
> > the first place so it was not clear to me from the commit message why
> > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > disallow reload operation during device cleanup") with a reproducer.
> 
> It was added because devlink ops were accessible by the user space very
> early in the driver lifetime. All my latest devlink patches are the
> attempt to fix this arch/design/implementation issue.

The reproducer in the commit message executed the reload after the
device was fully initialized. IIRC, the problem there was that nothing
prevented these two tasks from racing:

devlink dev reload netdevsim/netdevsim10
echo 10 > /sys/bus/netdevsim/del_device

The title also talks about forbidding reload during device cleanup.

> 
> > 
> > Tried the reproducer with this series and I cannot reproduce the issue.
> > Wasn't quite sure why, but it does not seem to be related to "changes to
> > allow dynamically set the reload_up/_down callbacks", as this seems to
> > be specific to mlx5.
> 
> You didn't reproduce because of my series that moved
> devlink_register()/devlink_unregister() to be last/first commands in
> .probe()/.remove() flows.

Agree, that is what I wrote in the next paragraph of my reply.

> 
> Patch to allow dynamically set ops was needed because mlx5 had logic
> like this:
>  if(something)
>     devlink_reload_enable()
> 
> And I needed a way to keep this if ... condition.
> 
> > 
> > IIUC, the reason that the race described in above mentioned commit can
> > no longer happen is related to the fact that devlink_unregister() is
> > called first in the device dismantle path, after your previous patches.
> > Since both the reload operation and devlink_unregister() hold
> > 'devlink_mutex', it is not possible for the reload operation to race
> > with device dismantle.
> > 
> > Agree? If so, I think it would be good to explain this in the commit
> > message unless it's clear to everyone else.
> 
> I don't agree for very simple reason that devlink_mutex is going to be
> removed very soon and it is really not a reason why devlink reload is
> safer now when before.
> 
> The reload can't race due to:
> 1. devlink_unregister(), which works as a barrier to stop accesses
> from the user space.
> 2. reference counting that ensures that all in-flight commands are counted.
> 3. wait_for_completion that blocks till all commands are done.

So the wait_for_completion() is what prevents the race, not
'devlink_mutex' that is taken later. This needs to be explained in the
commit message to make it clear why the removal is safe.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ