lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d27845d-a8f3-607b-1b6b-8464de65162c@huawei.com>
Date:   Tue, 6 Mar 2018 18:05:20 +0200
From:   Igor Stoppa <igor.stoppa@...wei.com>
To:     Matthew Wilcox <willy@...radead.org>
CC:     <david@...morbit.com>, <keescook@...omium.org>,
        <mhocko@...nel.org>, <labbott@...hat.com>,
        <linux-security-module@...r.kernel.org>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>,
        <kernel-hardening@...ts.openwall.com>
Subject: Re: [PATCH 1/7] genalloc: track beginning of allocations

On 06/03/2018 16:10, Matthew Wilcox wrote:
> On Wed, Feb 28, 2018 at 10:06:14PM +0200, Igor Stoppa wrote:
>> + * Encoding of the bitmap tracking the allocations
>> + * -----------------------------------------------
>> + *
>> + * The bitmap is composed of units of allocations.
>> + *
>> + * Each unit of allocation is represented using 2 consecutive bits.
>> + *
>> + * This makes it possible to encode, for each unit of allocation,
>> + * information about:
>> + *  - allocation status (busy/free)
>> + *  - beginning of a sequennce of allocation units (first / successive)
>> + *
>> + *
>> + * Dictionary of allocation units (msb to the left, lsb to the right):
>> + *
>> + * 11: first allocation unit in the allocation
>> + * 10: any subsequent allocation unit (if any) in the allocation
>> + * 00: available allocation unit
>> + * 01: invalid
>> + *
>> + * Example, using the same notation as above - MSb.......LSb:
>> + *
>> + *  ...000010111100000010101011   <-- Read in this direction.
>> + *     \__|\__|\|\____|\______|
>> + *        |   | |     |       \___ 4 used allocation units
>> + *        |   | |     \___________ 3 empty allocation units
>> + *        |   | \_________________ 1 used allocation unit
>> + *        |   \___________________ 2 used allocation units
>> + *        \_______________________ 2 empty allocation units
>> + *
>> + * The encoding allows for lockless operations, such as:
>> + * - search for a sufficiently large range of allocation units
>> + * - reservation of a selected range of allocation units
>> + * - release of a specific allocation
>> + *
>> + * The alignment at which to perform the research for sequence of empty
>> + * allocation units (marked as zeros in the bitmap) is 2^1.
>> + *
>> + * This means that an allocation can start only at even places
>> + * (bit 0, bit 2, etc.) in the bitmap.
>> + *
>> + * Therefore, the number of zeroes to look for must be twice the number
>> + * of desired allocation units.
>> + *
>> + * When it's time to free the memory associated to an allocation request,
>> + * it's a matter of checking if the corresponding allocation unit is
>> + * really the beginning of an allocation (both bits are set to 1).
>> + *
>> + * Looking for the ending can also be performed locklessly.
>> + * It's sufficient to identify the first mapped allocation unit
>> + * that is represented either as free (00) or busy (11).
>> + * Even if the allocation status should change in the meanwhile, it
>> + * doesn't matter, since it can only transition between free (00) and
>> + * first-allocated (11).
> 
> This seems unnecessarily complicated.

TBH it seemed to me a natural extension of the existing encoding :-)

>  Why not handle it like this:
> 
>  - Double the bitmap in size (as you have done) but
>  - The first half of the bits are unchanged from the existing implementation
>  - The second half of the bits are used for determining the length

Wouldn't that mean a less tight loop and less localized data?
The implementation from this patch does not have to jump elsewhere, when
(un)marking the allocation units and the start.

> On allocation, you look for a sufficiently-large string of 0 bits in
> the first-half.  When you find it, you set all of them to 1, and set one
> bit in the second-half to indicate where the tail of the allocation is
> (you might actually want to use an rbtree or something to handle this ...
> using all these bits seems pretty inefficient).

1 bit maps to 1 unit of allocation, which is very seldom 1 byte.
For pmalloc use, I expect that the average allocation is likely to be
2-4 units, where 1 unit equals either a 32 or 64 bits word.
So it's probably likely that for every couple of allocation units, one
is marked as start-of-allocation.

In other cases where genalloc is used, like the tracking of uncached
pages, 1 unit of allocation equals to 1 page.

I would expect the rbtree to end up generating a far larger footprint.

For the same reasons, since the bitmap is implemented using unsigned
longs, chances are high that one allocation will fit in one bitmap
"word", which means that if the "beginning" bit and the "occupied" bit
are adjacent, one write is sufficient.

In the case you describe, it would be almost always at least 2.

I do not have factual evidence to back my reasoning, but it seems more
likely to be the case, from my analysis of data types that could belong
to pools (both existing users of genalloc and my experiments with
SELinux data structures and pmalloc).

Even in the XFS case, if I understood correctly, it was about protecting
1 or 2 pages at a time, which seems to fit what I have empirically observed.

What makes you think otherwise?

--
igor

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ