lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVx0JFchtJrrKVqEYvTwWvC+DwSLxzhD_A7EdNu2PiG7w@mail.gmail.com>
Date:	Fri, 13 Nov 2015 11:46:07 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Daniel Micay <danielmicay@...il.com>
Cc:	Minchan Kim <minchan@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Linux API <linux-api@...r.kernel.org>,
	Hugh Dickins <hughd@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Jason Evans <je@...com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Shaohua Li <shli@...nel.org>, Michal Hocko <mhocko@...e.cz>,
	yalin wang <yalin.wang2010@...il.com>
Subject: Re: [PATCH v3 01/17] mm: support madvise(MADV_FREE)

On Fri, Nov 13, 2015 at 12:13 AM, Daniel Micay <danielmicay@...il.com> wrote:
> On 13/11/15 02:03 AM, Minchan Kim wrote:
>> On Fri, Nov 13, 2015 at 01:45:52AM -0500, Daniel Micay wrote:
>>>> And now I am thinking if we use access bit, we could implment MADV_FREE_UNDO
>>>> easily when we need it. Maybe, that's what you want. Right?
>>>
>>> Yes, but why the access bit instead of the dirty bit for that? It could
>>> always be made more strict (i.e. access bit) in the future, while going
>>> the other way won't be possible. So I think the dirty bit is really the
>>> more conservative choice since if it turns out to be a mistake it can be
>>> fixed without a backwards incompatible change.
>>
>> Absolutely true. That's why I insist on dirty bit until now although
>> I didn't tell the reason. But I thought you wanted to change for using
>> access bit for the future, too. It seems MADV_FREE start to bloat
>> over and over again before knowing real problems and usecases.
>> It's almost same situation with volatile ranges so I really want to
>> stop at proper point which maintainer should decide, I hope.
>> Without it, we will make the feature a lot heavy by just brain storming
>> and then causes lots of churn in MM code without real bebenfit
>> It would be very painful for us.
>
> Well, I don't think you need more than a good API and an implementation
> with no known bugs, kernel security concerns or backwards compatibility
> issues. Configuration and API extensions are something for later (i.e.
> land a baseline, then submit stuff like sysctl tunables). Just my take
> on it though...
>

As long as it's anonymous MAP_PRIVATE only, then the security aspects
should be okay.  MADV_DONTNEED seems to work on pretty much any VMA,
and there's been long history of interesting bugs there.

As for dirty vs accessed, an argument in favor of going straight to
accessed is that it means that users can write code like this without
worrying about whether they have a kernel that uses the dirty bit:

x = mmap(...);
*x = 1;  /* mark it present */

/* i'm done with it */
*x = 1;
madvise(MADV_FREE, x, ...);

wait a while;

/* is it still there? */
if (*x == 1) {
  /* use whatever was cached there */
} else {
 /* reinitialize it */
 *x = 1;
}

With the dirty bit, this will look like it works, but on occasion
users will lose the race where they probe *x to see if the data was
lost and then the data gets lost before the next write comes in.

Sure, that load from *x could be changed to RMW or users could do a
dummy write (e.g. x[1] = 1; if (*x == 1) ...), but people might forget
to do that, and the caching implications are a little bit worse.

Note that switching to RMW is really really dangerous.  Doing:

*x &= 1;
if (*x == 1) ...;

is safe on x86 if the compiler generates:

andl $1, (%[x]);
cmpl $1, (%[x]);

but is unsafe if the compiler generates:

movl (%[x]), %eax;
andl $1, %eax;
movl %eax, (%[x]);
cmpl $1, %eax;

and even worse if the write is omitted when "provably" unnecessary.

OTOH, if switching to the accessed bit is too much of a mess, then
using the dirty bit at first isn't so bad.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ