[<prev] [next>] [day] [month] [year] [list]
Message-ID: <a9af1185-7ad4-c3a1-b61c-063ddc013c85@yandex-team.ru>
Date: Tue, 7 Jan 2020 12:35:32 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: "lixinhai.lxh@...il.com" <lixinhai.lxh@...il.com>,
Konstantin Khlebnikov <koct9i@...il.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
akpm <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"richardw.yang" <richardw.yang@...ux.intel.com>,
"kirill.shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when
fork
On 07/01/2020 12.05, lixinhai.lxh@...il.com wrote:
> On 2020-01-07 at 16:24 Konstantin Khlebnikov wrote:
>> On 07/01/2020 05.44, lixinhai.lxh@...il.com wrote:
>>> On 2020-01-07 at 04:35 Konstantin Khlebnikov wrote:
>>>> On Mon, Jan 6, 2020 at 1:42 PM Konstantin Khlebnikov
>>>> <khlebnikov@...dex-team.ru> wrote:
>>>>>
>>>>> This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>>>> mergeable anon_vma as parent when fork").
>>>>>
>>>>> First problem caused by initialization order in dup_mmap(): vma->vm_prev
>>>>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>>>> previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().
>>>>>
>>>>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>>>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>>>> use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
>>>>> will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
>>>>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>>>
>>>>> Also this patch moves sharing logic out of anon_vma_clone() into more
>>>>> specific anon_vma_fork() because this supposed to work only at fork().
>>>>> Function anon_vma_clone() is more generic is also used at splitting VMAs.
>>>>>
>>>>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>>>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>>>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>>>> because in current code 'prev' actually is same as 'pprev'. To avoid that
>>>>> this patch just checks pointer and compares vm_start to verify relation
>>>>> between previous VMAs in parent and child.
>>>>>
>>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
>>>>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>>>
>>>> Oops, I've forgot to mention that Li Xinhai <lixinhai.lxh@...il.com>
>>>> found and reported this suspicious code. Sorry.
>>>>
>>>> Reported-by: Li Xinhai <lixinhai.lxh@...il.com>
>>>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>>>
>>>
>>> Can we change the interface
>>> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma),
>>> to
>>> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, struct vm_area_struct *pcvma),
>>> and 'pcvma' means previous child vma.
>>> so highlight the use of that vma, and the current code sequence for linking 'tmp' vma
>>> in dup_mmap() is not changed(in case some code would have dependency on that
>>> linking sequence)
>>
>> There should be no dependency on linking sequence.
>> But we could generalize sharing: cloned vma could share prev anon-vma
>> (or any other actually) if anon_vma->parent == src->anon_vma.
>> This is more clear than sharing only between related vmas.
>>
> I mean in
> int dup_mmap(...)
> {
> tmp = vm_area_dup(mpnt);
>
> //.... some code in this part would have dependency on 'tmp->vm_prev, tmp->vm_next'
> // ... I didn't go through all this part, but need take care about it.
> // so, better to keep the current semantic and pass in *pprev for anon vma call
Yeah, there is weird and unintentional reference in case VM_WIPEONFORK:
anon_vma_prepare -> find_mergeable_anon_vma
This is just dangerous.
>
> /*
> * Link in the new vma and copy the page table entries.
> */
> *pprev = tmp;
> pprev = &tmp->vm_next;
> tmp->vm_prev = prev;
> prev = tmp;
>
> }
>
>>>
>>> Another issue is for linking the avc for the reused anon_vma. anon_vma_clone()
>>> use the iteration
>>> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma),
>>> to link avc for child vma, and it is unable to reach the resued anon_vma because
>>> that is from the previous vma not from parent vma. So, in anon_vma_fork(),
>>> we need to setup the avc link for vma->anon.
>>
>> Oh, yes. That's another example where current code miraculously stays correct.
>>
>>>
>>>>> ---
>>>>> kernel/fork.c | 4 ++--
>>>>> mm/rmap.c | 25 ++++++++++++-------------
>>>>> 2 files changed, 14 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>> index 2508a4f238a3..04ee5e243f65 100644
>>>>> --- a/kernel/fork.c
>>>>> +++ b/kernel/fork.c
>>>>> @@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>> if (retval)
>>>>> goto fail_nomem_policy;
>>>>> tmp->vm_mm = mm;
>>>>> + tmp->vm_prev = prev; /* anon_vma_fork use this */
>>>>> + tmp->vm_next = NULL;
>>>>> retval = dup_userfaultfd(tmp, &uf);
>>>>> if (retval)
>>>>> goto fail_nomem_anon_vma_fork;
>>>>> @@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>> } else if (anon_vma_fork(tmp, mpnt))
>>>>> goto fail_nomem_anon_vma_fork;
>>>>> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>>>> - tmp->vm_next = tmp->vm_prev = NULL;
>>>>> file = tmp->vm_file;
>>>>> if (file) {
>>>>> struct inode *inode = file_inode(file);
>>>>> @@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>> */
>>>>> *pprev = tmp;
>>>>> pprev = &tmp->vm_next;
>>>>> - tmp->vm_prev = prev;
>>>>> prev = tmp;
>>>>>
>>>>> __vma_link_rb(mm, tmp, rb_link, rb_parent);
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index b3e381919835..77b3aa38d5c2 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>> {
>>>>> struct anon_vma_chain *avc, *pavc;
>>>>> struct anon_vma *root = NULL;
>>>>> - struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>>>> -
>>>>> - /*
>>>>> - * If parent share anon_vma with its vm_prev, keep this sharing in in
>>>>> - * child.
>>>>> - *
>>>>> - * 1. Parent has vm_prev, which implies we have vm_prev.
>>>>> - * 2. Parent and its vm_prev have the same anon_vma.
>>>>> - */
>>>>> - if (!dst->anon_vma && src->anon_vma &&
>>>>> - pprev && pprev->anon_vma == src->anon_vma)
>>>>> - dst->anon_vma = prev->anon_vma;
>>>>> -
>>>>>
>>>>> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>>> struct anon_vma *anon_vma;
>>>>> @@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>> */
>>>>> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>> {
>>>>> + struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
>>>>> struct anon_vma_chain *avc;
>>>>> struct anon_vma *anon_vma;
>>>>> int error;
>>>>> @@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>> /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>>>>> vma->anon_vma = NULL;
>>>>>
>>>>> + /*
>>>>> + * If parent shares anon_vma with its vm_prev, keep this sharing.
>>>>> + *
>>>>> + * Previous VMA could be missing or not match previuos in parent
>>>>> + * if VM_DONTCOPY is set: compare vm_start to avoid this case.
>>>>> + */
>>>>> + if (pvma->anon_vma && pprev && prev &&
>>>>> + pprev->anon_vma == pvma->anon_vma &&
>>>>> + pprev->vm_start == prev->vm_start)
>>>>> + vma->anon_vma = prev->anon_vma;
>>>>> +
>>>>> /*
>>>>> * First, attach the new VMA to the parent VMA's anon_vmas,
>>>>> * so rmap can find non-COWed pages in child processes.
>>>>>
>>>> >
Powered by blists - more mailing lists