[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0706031127220.13247@alien.or.mcafeemobile.com>
Date: Sun, 3 Jun 2007 11:53:00 -0700 (PDT)
From: Davide Libenzi <davidel@...ilserver.org>
To: Ulrich Drepper <drepper@...hat.com>
cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [patch 2/2] ufd v1 - use unsequential O(1) fdmap
On Sun, 3 Jun 2007, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Davide Libenzi wrote:
> > #define FD_UNSEQ_BASE (1U << 28)
>
> I agree with Ingo, no need for a second magic value. Use the same value
> as FD_UNSEQ_ALLOC which will just mean this exact value should never be
> used as a file descriptor.
I explained this in my answer to Ingo...
> If it's not too expensive, I would prefer to see fdmap_newfd to return
> more or less random descriptor values. This way nobody can try to rely
> on a specific sequence. Plus it might add a tiny bit of extra security.
Random can be expensive. At the moment is FIFO. I'm missing though how
this can be a security flaw, when the legacy one is exactly predictable.
> While dup2() and fcntl() might seem like good candidates to introduce
> the new functionality I think we should jump the gun and do it right.
> There are both not the best fit, as can be seen by your description.
> The parameter is now a hint. Additionally everybody who'd use
> dup2/fcntl would have to issue a close syscall right after it. Finally,
> I'm worried about accidental use of the new functionality. Not too
> likely but it can happen. This will cause hard to debug problems.
>
> I think it's better to have a dedicated interface:
>
> int nonseqfd (int fd, int flags);
>
> The semantics would include returning the new descriptor, closing the
> old descriptor (maybe this can be overwritten with a flags bit). I
> guess I would also like to see the default of close_on_exit changed but
> perhaps the caller can be required to set an appropriate bit in the
> flags parameter.
>
> This approach is cleaner, no magic constants exported from the kernel
> and it should be more efficient in general. It probably will also spark
> reevaluating the choice of the interface for fdmap_newfd. I don't like
> overloading a parameter to be used as a flag and a value at the same time.
I can do a new syscall, no problem (I actually even slightly prefer). We
cannot break dup2() and F_DUPFD though, so we have to handle those too.
I was just trying to use Linus suggestion of using sus_dup2().
> The flags parameter will also allow to specify the additional
> functionality needed. For instance, by default descriptors allocated
> this way *should* appear in /proc/self/fd. You mentioned web servers
> which don't care about sequential allocation and are slowed down by the
> current strict allocation. Those should have the descriptor appear in
> /proc/self/fd. On the other hand, uses of the interfaces in, say,
> glibc or valgrind should create invisible descriptor. Well, descriptors
> visible in perhaps /proc/self/fd-private or so.
Ohh, my last work yesterday was changing procfs/base.c to have them show
up in there. We have quite a few flags available (31 or 63) to be
assicated with each fd, so I guess we could use one for that.
> BTW: those whose perfect knowledge of the English language, I guess
> non-sequential is better than unsequential, right? This would require
> renaming.
Yeah :D
> > repeat:
> > + if (files->fd_count >= current->signal->rlim[RLIMIT_NOFILE].rlim_cur)
> > + return -EMFILE;
>
> I haven't studied the entire patch completely. So, help me understand
> this. ->fd_count is the exact count for the number of descriptors which
> is open. This would make sense.
Yes, that's the count of open fds.
> But I hope everybody realizes that this is a change in the ABI. So far
> RLIMIT_NOFILE specified the highest file descriptor number which could
> be returned by an open call etc.
>
> There is a fine difference. Even if yo have just a couple of
> descriptors open, you could not, for instance, dup2 to a descriptor
> higher than RLIMIT_NOFILE. With this patch it seems possible, even for
> processes not using non-sequential descriptors. This means programs
> written on new kernels might not run on older kernels.
>
> I don't say that this is unacceptable. It is a change. If it can be
> minimized this would be better but the new RLIMIT_NOFILE semantics is
> certainly also correct according to POSIX.
If you look a few lines below, there's also (this is inside the lagacy
fd allocator BTW):
if (fd >= current->signal->rlim[RLIMIT_NOFILE].rlim_cur)
goto out;
So the POSIX behaviour does not change. You cannot dup2() to an fd higher
than RLIMIT_NOFILE (when using legacy fd allocation), *and* you cannot
open more than RLIMIT_NOFILE files.
This means that if you have RLIMIT_NOFILE==1000 and you have 999 files
open in the nonsequential area, you can only open one file in the legacy
area.
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists