linux-kernel - Re: [RFC] atomic highmem kmap page pinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28c262360903051423g1fbf5067i9835099d4bf324ae@mail.gmail.com>
Date:	Fri, 6 Mar 2009 07:23:44 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Nicolas Pitre <nico@....org>
Cc:	lkml <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	Russell King - ARM Linux <linux@....linux.org.uk>
Subject: Re: [RFC] atomic highmem kmap page pinning

On Thu, Mar 5, 2009 at 1:57 PM, Nicolas Pitre <nico@....org> wrote:
> On Thu, 5 Mar 2009, Minchan Kim wrote:
>
>> On Wed, 04 Mar 2009 21:37:43 -0500 (EST)
>> Nicolas Pitre <nico@....org> wrote:
>>
>> > My assertion is that the cost is negligible.  This is why I'm asking you
>> > why you think this is a big cost.
>>
>> Of course, I am not sure whether it's big cost or not.
>> But I thought it already is used in many fs, driver.
>> so, whether it's big cost depends on workload type .
>>
>> However, This patch is needed for VIVT and no coherent cache.
>> Is right ?
>>
>> If it is right, it will add unnessary overhead in other architecture
>> which don't have this problem.
>>
>> I think it's not desirable although it is small cost.
>> If we have a other method which avoids unnessary overhead, It would be better.
>> Unfortunately, I don't have any way to solve this, now.
>
> OK.  What about this patch then:

It looks good to me except one thing below.
Reviewed-by: MinChan Kim <minchan.kim@...il.com>

> From c4db60c3a2395476331b62e08cf1f64fc9af8d54 Mon Sep 17 00:00:00 2001
> From: Nicolas Pitre <nico@....org>
> Date: Wed, 4 Mar 2009 22:49:41 -0500
> Subject: [PATCH] atomic highmem kmap page pinning
>
> Most ARM machines have a non IO coherent cache, meaning that the
> dma_map_*() set of functions must clean and/or invalidate the affected
> memory manually before DMA occurs.  And because the majority of those
> machines have a VIVT cache, the cache maintenance operations must be
> performed using virtual
> addresses.
>
> When a highmem page is kunmap'd, its mapping (and cache) remains in place
> in case it is kmap'd again. However if dma_map_page() is then called with
> such a page, some cache maintenance on the remaining mapping must be
> performed. In that case, page_address(page) is non null and we can use
> that to synchronize the cache.
>
> It is unlikely but still possible for kmap() to race and recycle the
> virtual address obtained above, and use it for another page before some
> on-going cache invalidation loop in dma_map_page() is done. In that case,
> the new mapping could end up with dirty cache lines for another page,
> and the unsuspecting cache invalidation loop in dma_map_page() might
> simply discard those dirty cache lines resulting in data loss.
>
> For example, let's consider this sequence of events:
>
>        - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
>
>        -->     - vaddr = page_address(page) is non null. In this case
>                it is likely that the page has valid cache lines
>                associated with vaddr. Remember that the cache is VIVT.
>
>                -->     for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
>                                invalidate_cache_line(i);
>
>        *** preemption occurs in the middle of the loop above ***
>
>        - kmap_high() is called for a different page.
>
>        -->     - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
>                  is called.  The pkmap_count value for the page passed
>                  to dma_map_page() above happens to be 1, so the page
>                  is unmapped.  But prior to that, flush_cache_kmaps()
>                  cleared the cache for it.  So far so good.
>
>                - A fresh pkmap entry is assigned for this kmap request.
>                  The Murphy law says this pkmap entry will eventually
>                  happen to use the same vaddr as the one which used to
>                  belong to the other page being processed by
>                  dma_map_page() in the preempted thread above.
>
>        - The kmap_high() caller start dirtying the cache using the
>          just assigned virtual mapping for its page.
>
>        *** the first thread is rescheduled ***
>
>                        - The for(...) loop is resumed, but now cached
>                          data belonging to a different physical page is
>                          being discarded !
>
> And this is not only a preemption issue as ARM can be SMP as well,
> making the above scenario just as likely. Hence the need for some kind
> of pkmap page pinning which can be used in any context, primarily for
> the benefit of dma_map_page() on ARM.
>
> This provides the necessary interface to cope with the above issue if
> ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
> unchanged.
>
> Signed-off-by: Nicolas Pitre <nico@...vell.com>
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b36b83b..cc61399 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -67,6 +67,25 @@ pte_t * pkmap_page_table;
>
>  static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
>
> +/*
> + * Most architectures have no use for kmap_high_get(), so let's abstract
> + * the disabling of IRQ out of the locking in that case to save on a
> + * potential useless overhead.
> + */
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +#define spin_lock_kmap()             spin_lock_irq(&kmap_lock)
> +#define spin_unlock_kmap()           spin_unlock_irq(&kmap_lock)
> +#define spin_lock_kmap_any(flags)    spin_lock_irqsave(&kmap_lock, flags)
> +#define spin_unlock_kmap_any(flags)  spin_unlock_irqrestore(&kmap_lock, flags)
> +#else
> +#define spin_lock_kmap()             spin_lock(&kmap_lock)
> +#define spin_unlock_kmap()           spin_unlock(&kmap_lock)
> +#define spin_lock_kmap_any(flags)    \
> +       do { spin_lock(&kmap_lock); (void)(flags); } while (0)
> +#define spin_unlock_kmap_any(flags)  \
> +       do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
> +#endif
> +
>  static void flush_all_zero_pkmaps(void)
>  {
>        int i;
> @@ -113,9 +132,9 @@ static void flush_all_zero_pkmaps(void)
>  */
>  void kmap_flush_unused(void)
>  {
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap();
>        flush_all_zero_pkmaps();
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap();
>  }
>
>  static inline unsigned long map_new_virtual(struct page *page)
> @@ -145,10 +164,10 @@ start:
>
>                        __set_current_state(TASK_UNINTERRUPTIBLE);
>                        add_wait_queue(&pkmap_map_wait, &wait);
> -                       spin_unlock(&kmap_lock);
> +                       spin_unlock_kmap();
>                        schedule();
>                        remove_wait_queue(&pkmap_map_wait, &wait);
> -                       spin_lock(&kmap_lock);
> +                       spin_lock_kmap();
>
>                        /* Somebody else might have mapped it while we slept */
>                        if (page_address(page))
> @@ -184,29 +203,59 @@ void *kmap_high(struct page *page)
>         * For highmem pages, we can't trust "virtual" until
>         * after we have the lock.
>         */
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap();
>        vaddr = (unsigned long)page_address(page);
>        if (!vaddr)
>                vaddr = map_new_virtual(page);
>        pkmap_count[PKMAP_NR(vaddr)]++;
>        BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap();
>        return (void*) vaddr;
>  }
>
>  EXPORT_SYMBOL(kmap_high);
>
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +/**
> + * kmap_high_get - pin a highmem page into memory
> + * @page: &struct page to pin
> + *
> + * Returns the page's current virtual memory address, or NULL if no mapping
> + * exists.  When and only when a non null address is returned then a
> + * matching call to kunmap_high() is necessary.
> + *
> + * This can be called from any context.
> + */
> +void *kmap_high_get(struct page *page)
> +{
> +       unsigned long vaddr, flags;
> +
> +       spin_lock_kmap_any(flags);
> +       vaddr = (unsigned long)page_address(page);
> +       if (vaddr) {
> +               BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> +               pkmap_count[PKMAP_NR(vaddr)]++;
> +       }
> +       spin_unlock_kmap_any(flags);
> +       return (void*) vaddr;
> +}
> +#endif

Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,

> +
>  /**
>  * kunmap_high - map a highmem page into memory
>  * @page: &struct page to unmap
> + *
> + * If ARCH_NEEDS_KMAP_HIGH_GET is not defined then this may be called
> + * only from user context.
>  */
>  void kunmap_high(struct page *page)
>  {
>        unsigned long vaddr;
>        unsigned long nr;
> +       unsigned long flags;
>        int need_wakeup;
>
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap_any(flags);
>        vaddr = (unsigned long)page_address(page);
>        BUG_ON(!vaddr);
>        nr = PKMAP_NR(vaddr);
> @@ -232,7 +281,7 @@ void kunmap_high(struct page *page)
>                 */
>                need_wakeup = waitqueue_active(&pkmap_map_wait);
>        }
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap_any(flags);
>
>        /* do wake-up, if needed, race-free outside of the spin lock */
>        if (need_wakeup)
>



-- 
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/