netdev - Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201214123305.288f49bf@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Mon, 14 Dec 2020 12:33:05 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Wei Wang <weiwan@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Felix Fietkau <nbd@....name>, Hillf Danton <hdanton@...a.com>
Subject: Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll
 loop support

On Mon, 14 Dec 2020 11:45:43 -0800 Wei Wang wrote:
> > It is quite an annoying problem to address, given all relevant NAPI
> > helpers seem to return void :/ But we're pushing the problem onto the
> > user just because of internal API structure.
> >
> > This reminds me of PTP / timestamping issues some NICs had once upon
> > a time. The timing application enables HW time stamping, then later some
> > other application / orchestration changes a seemingly unrelated config,
> > and since NIC has to reset itself it looses the timestamping config.
> > Now the time app stops getting HW time stamps, but those are best
> > effort anyway, so it just assumes the NIC couldn't stamp given frame
> > (for every frame), not that config got completely broken. The system
> > keeps running with suboptimal time for months.
> >
> > What does the deployment you're expecting to see looks like? What
> > entity controls enabling the threaded mode on a system? Application?
> > Orchestration? What's the flow?
> >  
> I see your point. In our deployment, we have a system daemon which is
> responsible for setting up all the system tunings after the host boots
> up (before application starts to run). If certain operation fails, it
> prints out error msg, and will exit with error. For applications that
> require threaded mode, I think a check to the sysfs entry to make sure
> it is enabled is necessary at the startup phase.

That assumes no workload stacking, and dynamic changes after the
workload has started? Or does the daemon have enough clever logic
to resolve config changes?

> > "Forgetting" config based on driver-dependent events feels very fragile.  
> I think we could add a recorded value in dev to represent the user
> setting, and try to enable threaded mode after napi_disable/enable.
> But I think user/application still has to check the sysfs entry value
> to make sure if it is enabled successfully.

In case of an error you're thinking of resetting, still, and returning
disabled from sysfs? I guess that's fine, we can leave failing the bad
reconfig operation (rather than resetting config) as a future extension.
Let's add a WARN_ON, tho, so the failures don't get missed.