netdev - Re: [PATCH net 2/2] conntrack: enable to tune gc parameters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161014103726.GA10404@breakpoint.cc>
Date:   Fri, 14 Oct 2016 12:37:26 +0200
From:   Florian Westphal <fw@...len.de>
To:     Nicolas Dichtel <nicolas.dichtel@...nd.com>
Cc:     Florian Westphal <fw@...len.de>, davem@...emloft.net,
        pablo@...filter.org, netdev@...r.kernel.org,
        netfilter-devel@...r.kernel.org
Subject: Re: [PATCH net 2/2] conntrack: enable to tune gc parameters

Nicolas Dichtel <nicolas.dichtel@...nd.com> wrote:
> Le 13/10/2016 à 22:43, Florian Westphal a écrit :
> > Nicolas Dichtel <nicolas.dichtel@...nd.com> wrote:
> >> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> >>> Nicolas Dichtel <nicolas.dichtel@...nd.com> wrote:
> >>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> >>>> timed-out entries"), netlink conntrack deletion events may be sent with a
> >>>> huge delay. It could be interesting to let the user tweak gc parameters
> >>>> depending on its use case.
> >>>
> >>> Hmm, care to elaborate?
> >>>
> >>> I am not against doing this but I'd like to hear/read your use case.
> >>>
> >>> The expectation is that in almot all cases eviction will happen from
> >>> packet path.  The gc worker is jusdt there for case where a busy system
> >>> goes idle.
> >> It was precisely that case. After a period of activity, the event is sent a long
> >> time after the timeout. If the router does not manage a lot of flows, why not
> >> trying to parse more entries instead of the default 1/64 of the table?
> >> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
> >> GC_MAX_BUCKETS whatever the size of the table is.
> > 
> > I wanted to make sure that we have a known upper bound on the number of
> > buckets we process so that we do not block other pending kworker items
> > for too long.
> I don't understand. GC_MAX_BUCKETS is the upper bound and I agree that it is
> needed. But why GC_MAX_BUCKETS_DIV (ie 1/64)?
> In other words, why this line:
> goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
> instead of:
> goal = GC_MAX_BUCKETS;

Sure, we can do that.  But why is a fixed size better than a fraction?

E.g. with 8k buckets and simple goal = GC_MAX_BUCKETS we scan entire
table on every run, currently we only scan 128.

I wanted to keep too many destroy notifications from firing at once
but maybe i was too paranoid...

> > (Or cause too many useless scans)
> > 
> > Another idea worth trying might be to get rid of the max cap and
> > instead break early in case too many jiffies expired.
> > 
> > I don't want to add sysctl knobs for this unless absolutely needed; its already
> > possible to 'force' eviction cycle by running 'conntrack -L'.
> > 
> Sure, but this is not a "real" solution, just a workaround.
> We need to find a way to deliver conntrack deletion events in a reasonable
> delay, whatever the traffic on the machine is.

Agree, but that depends on what 'reasonable' means and what kind of
uneeded cpu churn we're willing to add.

We can add a sysctl for this but we should use a low default to not do
too much unneeded work.

So what about your original patch, but only add

nf_conntrack_gc_interval

(and also add instant-resched in case entire budget was consumed)?