lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830907030911m6176dc59id3a7d897b03d2bd@mail.gmail.com>
Date:	Fri, 3 Jul 2009 09:11:56 -0700
From:	Paul Menage <menage@...gle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Benjamin Blum <bblum@...gle.com>, lizf@...fujitzu.com,
	serue@...ibm.com, containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that 
	shows only unique tgids

On Thu, Jul 2, 2009 at 11:55 PM, Andrew Morton<akpm@...ux-foundation.org> wrote:
>> The seq_file iterator for these files relies on them being sorted so
>> that it can pick up where it left off even in the event of the pid set
>> changing between reads - it does a binary search to find the first pid
>> greater than the last one that was returned, so as to guarantee that
>> we return every pid that was in the cgroup before the scan started and
>> remained in the cgroup until after the scan finished; there are no
>> guarantees about pids that enter/leave the cgroup during the scan.
>
> OIC.  Gee we made it hard for ourselves.  That tears it.

Without a guarantee like that, the file becomes much less useful - it
would just be "a random subset of the pids that were in the cgroup at
some time during the read".

>
> Was it a mistake to try to present an ordered, dupes-removed view of a
> large amount of data from the kernel?

The dupe-removal is a red-herring in this argument - that's only added
by this patch, and doesn't affect the existing "tasks" file. I think
we could even change it into a "mostly dupe-removed" view very cheaply
by simply ignoring a tgid when building the array if it's the same as
the previous one we just saw, and leaving userspace to deal with any
remaining dupes.

The ordered semantics of the tasks file comes from cpusets and I
maintained the API (which actually allocated a potentially even larger
array containing the pre-formatted data).

I'd be surprised if removing the ordering constraint broke anything in
userspace, but the more important part about the ordering is to allow
the kernel to present a consistent view in the presence of changing
pids.

Hmm, I guess we could use a combination of the IDR approach that you
suggested and the present shared-array approach:

- when opening a tasks file:
  - populate an IDR with all the pids/tgids in the cgroup
  - find any existing IDR open for the cgroup in the list keyed by
namespace and filetype ("procs"/"tasks")
  - replace (and free) the existing IDR or extend the list with a new one
  - bump the refcount

- when reading:
  - iterate from the last reported pid/tgid

- when closing:
  - drop the refcount, and free the IDR if the count reaches 0

That maintains the property of preventing userspace from consuming
arbitrary amounts of memory just by holding an fd open on a large
cgroup, while also maintaining a consistency guarantee, and we get the
ordering for free as a side-effect of the IDR, with no large memory
allocations. It's not hugely different from the current solution - all
we're doing is replacing the large array in the cgroup_pidlist
structure with an IDR, and populating/reading it appropriately.

The downsides would be a higher fixed cost, I suspect - setting up an
IDR and populating/scanning it sounds like it has to be more expensive
than filling/reading a linear buffer. But it's probably not enough
extra overhead to worry too much about it.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ