netdev - Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150922013627.GA15460@ketchup.mtl.sfl>
Date:	Mon, 21 Sep 2015 21:36:28 -0400
From:	Vivien Didelot <vivien.didelot@...oirfairelinux.com>
To:	Jiri Pirko <jiri@...nulli.us>
Cc:	Scott Feldman <sfeldma@...il.com>, Netdev <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Ido Schimmel <idosch@...lanox.com>, eladr@...lanox.com,
	Florian Fainelli <f.fainelli@...il.com>,
	Guenter Roeck <linux@...ck-us.net>
Subject: Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra
 and for pre-commit split

Hi Jiri,

On Sep. Monday 21 (39) 08:25 PM, Jiri Pirko wrote:
> Mon, Sep 21, 2015 at 07:13:58PM CEST, vivien.didelot@...oirfairelinux.com wrote:
> >Hi Jiri, Scott,
> >
> >On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
> >> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfeldma@...il.com wrote:
> >> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko <jiri@...nulli.us> wrote:
> >> >> Jiri Pirko (6):
> >> >>   switchdev: rename "trans" to "trans_ph".
> >> >>   switchdev: introduce transaction infrastructure for attr_set and
> >> >>     obj_add
> >> >>   rocker: switch to local transaction phase enum
> >> >>   switchdev: move transaction phase enum under transaction structure
> >> >>   rocker: use switchdev transaction queue for allocated memory
> >> >>   switchdev: split commit and prepare phase into two callbacks
> >> >
> >> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
> >> >of this email.
> >> 
> >> Told you :)
> >> 
> >> 
> >> >
> >> >I'm not sure I'm liking this patchset because it looks like a way for
> >> >switchdev drivers to easily opt-out of the prepare-commit transaction
> >> >model by simply not implementing the *_pre op.  I would rather drivers
> >> >explicitly handle the PREPARE phase in code, even if that means
> >> >skipping it gracefully (in code) with a comment (in code) explaining
> >> >why it does not matter for this device/operation.  That's what DSA had
> >> >done, mostly because it was a retro-fit.
> >> 
> >> Each driver should handle this inside it. If it does not need prepare
> >> state, it simply does not implement it. That is the same for all cb,
> >> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
> >> separate callbacks. Implementing multiple callback in one is just ugly,
> >> sorry.
> >
> >This is true, (in DSA) we don't have to implement the prepare phase if
> >we fully support the feature in hardware.
> >
> >To give a real example, Marvell switch drivers currently implement all
> >add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
> >prepare phase needed.
> >
> >Now, I have local patches to enable strict 802.1Q mode in these switches
> >(all the logic is based on the hardware VLAN table). But it does not use
> >per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
> >why we need to push the feature checking down to the drivers in DSA.
> >
> >I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
> >code will return -EOPNOTSUPP if the given VID is 0.
> >
> >Another example: mv88e6xxx support tagged VLANs, so no hardware check
> >needed. But the Broadcom Starfighter 2 only supports port-based VLANs
> >(which is today wrongly implemented through "bridge_join/leave"). By
> >implementing .port_vlan_pre_add (another pending patch for DSA), the
> >driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.
> >
> >Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
> >and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
> >the abort phase (calling each destructor) and getting rid of the
> >SWITCHDEV_TRANS_* flags sounds better to me.
> 
> Agree, if pre/commit is going to be in one function, we should have
> only prepare/commit enums. It can be carried around as a single bool
> value in switchdev_trans structure. Will include that in my transaction
> patchset.
> 
> 
> >
> >> >Also, the patchset removes the ABORT callback in case of a rollback
> >> >due to a failed PREPARE.  We can't make the assumption that it's just
> >> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
> >> >reserved device space or staged an operation on the device which we'll
> >> >need to undo on ABORT.
> >> 
> >> Yep, just register an item with custom destructor, there you can do
> >> whatever. Also, I believe much nicer comparing to current code.
> >> 
> >> 
> >> >
> >> >So we need ABORT back, and we need PREPARE to not be optional, so
> >> >what's left list enqueue/dequeue helpers, which I'm not seeing much
> >> >value in up-leveling as the driver can do list_add/del itself.
> >> 
> >> Why would every driver do it itself, over and over when there can be a
> >> clean infrastructure to do that. Including abort phase. Without the driver
> >> needed to be involved.
> >
> >Maybe the term ".destructor" has a too strong meaning to deallocation,
> >but you can indeed do whatever you need in this function.
> 
> It is a destructor. Don't know about a better name, suggestions?

Nope, I'm personally fine with this term.

> >
> >> >Am I missing something?  I didn't see a motivation statement for the
> >> >RFC so I'm not sure where you wanted to take this.
> >> 
> >> I want to make current code much nicer, easier to read and implement in
> >> other drivers. Look at rocker.c and how often there is == PREPARE there.
> >> It's nearly impossible to followthe code, sorry.
> >> 
> >> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is
> >> everywhere)
> >
> >I'm basically for this patchset, but I do have some concerns about the
> >general switchdev transaction design.
> >
> >I certainly don't have the perspective that you guys have, so please
> >tell me if I'm totally wrong. From my point of view (with DSA drivers),
> >the emphase should be done on asking the hardware if it can handle or
> >not a given transaction (not simply an hardware feature!).
> >
> >The driver can handle any allocation and preparation itself, and also
> >errors can actually occur *during* a commit phase.
> 
> But during commit phase, the odds that error is going to happen is a lot
> smaller. Prepare phase should resolve all common fails like memory
> allocations and resource limitation checks.

With mv88e6xxx, we do MDIO calls in the commit phase that we don't do in
the prepare phase, and they can actually fail.

switchdev assumes that no error can occur during a commit, this is
wrong. It also triggers a WARN() in that case, which is bad IMHO.

The prepare phase tries to help drivers with allocation and partial
checking, while this can be nice, it is not even switch-specific. That
can totally be done at the driver level, if it needs to.

I'm really trying to understand what is the real value behind this two
phases model, but looking at really use cases like DSA, I can't see one.

Maybe DSA is a specific case. Do you have a concrete example of
situation where a switch or driver would really need this pre-phase?

> 
> >
> >Being able to return -EOPNOTSUPP from any add/del/dump function would be
> >just perfect.
> 
> Looking at this, I agree that for switchdev_port_obj_add and
> switchdev_port_attr_set if would make more sense to return 0 in case of
> hw does not actuall support. Callers check for -EOPNOTSUP and treat
> that as 0 anyway. Feel free to send a patch for this.

I am not sure to understand. Are you suggesting to make the prepare
phase optional? Something like this:

--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -186,7 +186,7 @@ int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr)
 
 	attr->trans = SWITCHDEV_TRANS_COMMIT;
 	err = __switchdev_port_attr_set(dev, attr);
-	WARN(err, "%s: Commit of attribute (id=%d) failed.\n",
+	WARN(err != -EOPNOTSUPP, "%s: Commit of attribute (id=%d) failed.\n",
 	     dev->name, attr->id);
 
 	return err;
@@ -266,7 +266,8 @@ int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj)
 
 	obj->trans = SWITCHDEV_TRANS_COMMIT;
 	err = __switchdev_port_obj_add(dev, obj);
-	WARN(err, "%s: Commit of object (id=%d) failed.\n", dev->name, obj->id);
+	WARN(err != -EOPNOTSUPP, "%s: Commit of object (id=%d) failed.\n",
+	     dev->name, obj->id);
 
 	return err;
 }


If yes, that would allow simple driver like DSA to return 0 in the
prepare phrase (to basically ignore it) and let the drivers eventually
return -EOPNOTSUPP in the commit phase, which is way more simple.

But then, is there really a need for the prepare phase again?

Thanks,
-v

> 
> 
> >
> >I ideally imagine the following implementation in switchdev drivers:
> >
> >    int foo_port_obj_add(struct net_device *dev, int id, void *obj)
> >    {
> >        struct foo *foo = netdev_priv(dev);
> >	struct switchdev_obj_port_fdb *fdb;
> >	struct switchdev_obj_port_vlan *vlan;
> >        int err;
> >
> >        switch(id) {
> >	case SWITCHDEV_OBJ_PORT_FDB:
> >	    fdb = obj;
> >	    err = foo_port_fdb_add(foo, fdb);
> >	    break;
> >        case SWITCHDEV_OBJ_PORT_VLAN:
> >	    vlan = obj;
> >	    err = foo_port_vlan_add(foo, vlan);
> >	    break;
> >	default:
> >	    err = -EOPNOTSUPP;
> >	    break;
> >	}
> >
> >	return err;
> >    }
> >
> >Where foo_port_{fdb,vlan}_add can still return -EOPNOTSUPP depending on
> >their current state.
> 
> Yep, seems correct.

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html