lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfTgYhED1f6vdsoT72A3=D2Grh4U-A6pp43FLZoCs30Gw@mail.gmail.com>
Date:   Mon, 21 Dec 2020 15:21:57 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Antoine Tenart <atenart@...nel.org>
Cc:     David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v2 1/3] net: fix race conditions in xps by locking the
 maps and dev->tc_num

On Mon, Dec 21, 2020 at 11:36 AM Antoine Tenart <atenart@...nel.org> wrote:
>
> Two race conditions can be triggered in xps, resulting in various oops
> and invalid memory accesses:
>
> 1. Calling netdev_set_num_tc while netif_set_xps_queue:
>
>    - netdev_set_num_tc sets dev->tc_num.
>
>    - netif_set_xps_queue uses dev->tc_num as one of the parameters to
>      compute the size of new_dev_maps when allocating it. dev->tc_num is
>      also used to access the map, and the compiler may generate code to
>      retrieve this field multiple times in the function.
>
>    If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
>    is set to a higher value through netdev_set_num_tc, later accesses to
>    new_dev_maps in netif_set_xps_queue could lead to accessing memory
>    outside of new_dev_maps; triggering an oops.
>
>    One way of triggering this is to set an iface up (for which the
>    driver uses netdev_set_num_tc in the open path, such as bnx2x) and
>    writing to xps_cpus or xps_rxqs in a concurrent thread. With the
>    right timing an oops is triggered.
>
> 2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
>
>    2.1. netdev_set_num_tc starts by resetting the xps queues,
>         dev->tc_num isn't updated yet.
>
>    2.2. netif_set_xps_queue is called, setting up the maps with the
>         *old* dev->num_tc.
>
>    2.3. dev->tc_num is updated.
>
>    2.3. Later accesses to the map leads to out of bound accesses and
>         oops.
>
>    A similar issue can be found with netdev_reset_tc.
>
>    The fix can't be to only link the size of the maps to them, as
>    invalid configuration could still occur. The reset then set logic in
>    both netdev_set_num_tc and netdev_reset_tc must be protected by a
>    lock.
>
> Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
> and netdev_reset_tc should be mutually exclusive.
>
> This patch fixes those races by:
>
> - Reworking netif_set_xps_queue by moving the xps_map_mutex up so the
>   access of dev->num_tc is done under the lock.
>
> - Using xps_map_mutex in both netdev_set_num_tc and netdev_reset_tc for
>   the reset and set logic:
>
>   + As xps_map_mutex was taken in the reset path, netif_reset_xps_queues
>     had to be reworked to offer an unlocked version (as well as
>     netdev_unbind_all_sb_channels which calls it).
>
>   + cpus_read_lock was taken in the reset path as well, and is always
>     taken before xps_map_mutex. It had to be moved out of the unlocked
>     version as well.
>
>   This is why the patch is a little bit longer, and moves
>   netdev_unbind_sb_channel up in the file.
>
> Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
> Signed-off-by: Antoine Tenart <atenart@...nel.org>

Looking over this patch it seems kind of obvious that extending the
xps_map_mutex is making things far more complex then they need to be.

Applying the rtnl_mutex would probably be much simpler. Although as I
think you have already discovered we need to apply it to the store,
and show for this interface. In addition we probably need to perform
similar locking around traffic_class_show in order to prevent it from
generating a similar error.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ