lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fwa6pxol.fsf@xmission.com>
Date:	Thu, 07 Jun 2012 15:24:58 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Ani Sinha <ani@...stanetworks.com>
Cc:	netdev@...r.kernel.org,
	Francesco Ruggeri <fruggeri@...stanetworks.com>
Subject: Re: [PATCH] fix kernel crash in the macvlan driver

Ani Sinha <ani@...stanetworks.com> writes:

> Hi Eric :
>
> On Thu, 7 Jun 2012, Eric W. Biederman wrote:
>
>> I don't completely follow the logic of your change.  Crashing in
>> macvlan_addr_busy does seem to indicate you are using a corrupted data
>> structure.
>
> The logic of my change is as follows :
>
> As far as I can see, macvlan_newlink() pairs with macvlan_dellink(). If
> you are incrementing the reference count in newlink(), the corresponding
> decrement should be, in my opinion in dellink(). If you are derementing
> the count in uninit(), you are asuming that for every dellink() call,
> there is a corresponding uninit() call. I am not sure if this assumption
> is correct. Perhaps you can shed some more lights on this.

Yes.  Look at net/core/dev.c
dellink calls unregister_netdevice_queue.
The active part of unregister_netdevice_queue rollback_registered_many
calls dev->ndo_stop() and then ndo_uninit.

We might still be using rcu hash lookups until ndo_close is called and
so we really don't want to move the decrement before then.

>> My compiled version of macvlan_addr_busy is much smaller than yours so I
>> can't guess based on your disassembly what is wrong.  But by reading the
>> code it must either be port->dev->dev_addr or the rcu
>> macvlan_hash_lookup.
>
> Yes, the corruption is in port->dev->dev_addr. The dev_addr seems to get a
> bogus address value.

Interesting if it is port->dev->dev_addr than count is really out of the
picture.

My blind guess would be that port->dev is getting freed and recycled
before dev_addr gets accessed.  But macvlan_device_event seems to
prevent that.

>> I might just be dense today but I can't possibly see how moving that
>> decrement would solve the crash you have reported below.
>
> In my tests, I have confirmed that with my change, the crash I reported is
> no longer reproducable with our scripts. I have also verified that when I
> pull out your d5cd92448fded change, I can also no longer reproduce the
> issue. So I believe that the crash is related to the above change.
> However, I am not very familier with the code in the macvlan
> driver, so I can not say for sure that the fix I made genuinely solves the
> problem.

It sounds to me like you have a memory stomp and that recompiling the
code winds up changing what gets stomped.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ