lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Jun 2014 22:12:52 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Cong Wang <xiyou.wangcong@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	Patrick McHardy <kaber@...sh.net>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Cong Wang <cwang@...pensource.com>,
	Stefan Bader <stefan.bader@...onical.com>,
	stephane.graber@...onical.com, chris.j.arges@...onical.com,
	Serge Hallyn <serge.hallyn@...onical.com>
Subject: Re: [Patch net-next] net: make neigh tables per netns

Cong Wang <xiyou.wangcong@...il.com> writes:

> On Thu, Jun 26, 2014 at 3:44 PM, David Miller <davem@...emloft.net> wrote:
>>
>> First of all it is clear that once you start creating containers on the
>> order of half the global neigh limit, yes you will run into problems as
>> it's easy to have 2 or more outputs in flight.
>>
>> So it would perhaps be wise to scale the limits (in some way) based
>> upon the number of namespaces, but still keep it a global limit.
>>
>> These entries consume a global resource (memory) and benefit from
>> global sharing, so I am still convinced that making the tables
>> themselves per-ns does not make any sense.
>>
>> Secondly, if there are things holding onto neighbour entries for real
>> we should find this out.  Once could audit neigh_lookup*() invocations
>> to see where that might be happening.  Also neigh_create() calls with
>> 'want_ref' set to true.
>>
>
> Hmm, I did overlook the potential DOS problem. But hold on, isn't
> IP fragments have the same problem? The fragment queues are per
> netns, and the thresh is per netns as well, we will eventually have
> memory pressure as well.

Interesting.  It does look like ip fragments are susceptible that way.

Sorting out limits is something that that is still quite rough, in the
code today.

Limits serve two basic purposes.
- Basic sanity limits so that a buggy application can be
  killed/stopped hopefully before they take down the entire machine.

  Think of the file descriptor limit.

- Machine hogging limits to prevent one application from interferring
  with other applications.  This is what the kernel memory limit of
  the memory cgroup tries to implememt.

These purposes aren't entirely distinct.  So it is a bit of a challenge
to separate them.

Basic sanity limits are the easiest to comprehend as the reasoning is
all local.  You just have to say any application that uses more than X
amount of a resource is clearly buggy.  With a sysctl/rlimit knob to
handle those rare applications that legitimately need more than X.

Machine hogging limits are very different as that actually requires
looking at how global state is used.  I would like to say that the
memory cgroup tackles successfully that problem but it last I looked it
has some nasty deadlock potentials when dealing with kernel memory.

I wish I had a clear recipe I could point people at to get all of these
issues sorted correctly, unfortunately all I have is a little bit of
clarity as to what the problems actually are.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ