linux-kernel - Re: [PATCH 4/4] fdtable: Implement new pagesize-based fdtable allocation scheme.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <200610021042.51124.vlobanov@speakeasy.net>
Date:	Mon, 2 Oct 2006 10:42:51 -0700
From:	Vadim Lobanov <vlobanov@...akeasy.net>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	akpm@...l.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] fdtable: Implement new pagesize-based fdtable allocation scheme.

On Monday 02 October 2006 10:25, Eric Dumazet wrote:
> On Monday 02 October 2006 19:00, Vadim Lobanov wrote:
> > On Monday 02 October 2006 01:52, Eric Dumazet wrote:
> > > Current scheme is to allocate power of two sizes, and not 'the smallest
> > > that accommodates the requested fd count'. This is for a good reason,
> > > because we don't want to call vmalloc()/vfree() each time a process
> > > opens 512 or 1024 more files (x86_64 or ia32)
> >
> > Yep, that is most definitely a consideration. I was balancing it against
> > the fact that, when the table becomes big, growing it by a power of two
> > regardless of the size results in massive memory usage deltas. The worry
> > here is that an application may likely cause the table to grow by a huge
> > amount, due to the power-of-two increase, and then actually use only a
> > modest number of further fds, wasting the rest of the allocated table
> > memory.
> >
> > Which applications open so many file handles so quickly? Do they actually
> > need the amortized power-of-two table area increase? In those cases,
> > would the actual process of opening these files take more time than
> > growing the table in fixed-size steps? Or at least outweigh it enough
> > that it would be more preferable to try to reduce memory waste instead of
> > improve file open time?
>
> I think that for such applications, the 'waste' of ram for fd table is
> nothing compared to the ram cost of opened files/sockets/dentries/inodes.
>
> > Is it really true that it will create less fragmentation? It seems to me
> > that this will only be the true if most of the other heavy users of
> > vmalloc also tried to use power-of-two allocation sizes.
>
> I am quite sure that on my machines, big vmalloc users are fdtable most of
> the time.

Ok, I think I'm convinced. :) Do you want to dig up some actual data regarding 
vmalloc usage on your boxes, at least so it can be saved on the mailing list 
archives for future reference?

I'll reroll the last patch in the series soon, after letting it rest for a bit 
and seeing if anyone else has any more input on the proposed changes. I'll 
tweak it so that it always does power-of-two increases in table size, rather 
than switching to linear deltas after a certain point. Sounds good to you?

> > What do you think of Andi Kleen's follow-up suggestion about eliminating
> > vmalloc use altogether?
>
> It would be interesting, but would need one indirection level, that could
> kill performance of said huge applications... For such applications, the
> cost of expanding fdtable is nothing compared to the cost of
> open()/close()/poll()/read()/write() calls. This is because fdtable never
> shrinks. So adding one indirection (if you use a table of pointers to PAGES
> containing 1024 or 512 (struct file *)).
>
> At least, expanding fdtable by 1024 slots would need to reallocate the
> first table (adding one void *), and allocating one PAGE only. No need to
> copy previous pages, this is a win.
>
> Of course, big fdset still need vmalloc(), or else we cannot use
> find_next_zero_bit() anymore... And for such applications, the time and
> memory scanned to find a zero bit at open() time is probably the killer
> (touching a large part of cpu caches). One million bits is 128 KB...

Yep. (We could play really nifty games with the fdtable if we didn't have to 
return the lowest available fd for the new file, but alas...)

> Eric

-- Vadim Lobanov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/