netdev - Re: [Patch net-next] net: make neigh tables per netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87vbrl8vmz.fsf@x220.int.ebiederm.org>
Date:	Fri, 27 Jun 2014 22:12:52 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Cong Wang <xiyou.wangcong@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	Patrick McHardy <kaber@...sh.net>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Cong Wang <cwang@...pensource.com>,
	Stefan Bader <stefan.bader@...onical.com>,
	stephane.graber@...onical.com, chris.j.arges@...onical.com,
	Serge Hallyn <serge.hallyn@...onical.com>
Subject: Re: [Patch net-next] net: make neigh tables per netns

Cong Wang <xiyou.wangcong@...il.com> writes:

> On Thu, Jun 26, 2014 at 3:44 PM, David Miller <davem@...emloft.net> wrote:
>>
>> First of all it is clear that once you start creating containers on the
>> order of half the global neigh limit, yes you will run into problems as
>> it's easy to have 2 or more outputs in flight.
>>
>> So it would perhaps be wise to scale the limits (in some way) based
>> upon the number of namespaces, but still keep it a global limit.
>>
>> These entries consume a global resource (memory) and benefit from
>> global sharing, so I am still convinced that making the tables
>> themselves per-ns does not make any sense.
>>
>> Secondly, if there are things holding onto neighbour entries for real
>> we should find this out.  Once could audit neigh_lookup*() invocations
>> to see where that might be happening.  Also neigh_create() calls with
>> 'want_ref' set to true.
>>
>
> Hmm, I did overlook the potential DOS problem. But hold on, isn't
> IP fragments have the same problem? The fragment queues are per
> netns, and the thresh is per netns as well, we will eventually have
> memory pressure as well.

Interesting.  It does look like ip fragments are susceptible that way.

Sorting out limits is something that that is still quite rough, in the
code today.

Limits serve two basic purposes.
- Basic sanity limits so that a buggy application can be
  killed/stopped hopefully before they take down the entire machine.

  Think of the file descriptor limit.

- Machine hogging limits to prevent one application from interferring
  with other applications.  This is what the kernel memory limit of
  the memory cgroup tries to implememt.

These purposes aren't entirely distinct.  So it is a bit of a challenge
to separate them.

Basic sanity limits are the easiest to comprehend as the reasoning is
all local.  You just have to say any application that uses more than X
amount of a resource is clearly buggy.  With a sysctl/rlimit knob to
handle those rare applications that legitimately need more than X.

Machine hogging limits are very different as that actually requires
looking at how global state is used.  I would like to say that the
memory cgroup tackles successfully that problem but it last I looked it
has some nasty deadlock potentials when dealing with kernel memory.

I wish I had a clear recipe I could point people at to get all of these
issues sorted correctly, unfortunately all I have is a little bit of
clarity as to what the problems actually are.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html