[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fwa6pxol.fsf@xmission.com>
Date: Thu, 07 Jun 2012 15:24:58 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Ani Sinha <ani@...stanetworks.com>
Cc: netdev@...r.kernel.org,
Francesco Ruggeri <fruggeri@...stanetworks.com>
Subject: Re: [PATCH] fix kernel crash in the macvlan driver
Ani Sinha <ani@...stanetworks.com> writes:
> Hi Eric :
>
> On Thu, 7 Jun 2012, Eric W. Biederman wrote:
>
>> I don't completely follow the logic of your change. Crashing in
>> macvlan_addr_busy does seem to indicate you are using a corrupted data
>> structure.
>
> The logic of my change is as follows :
>
> As far as I can see, macvlan_newlink() pairs with macvlan_dellink(). If
> you are incrementing the reference count in newlink(), the corresponding
> decrement should be, in my opinion in dellink(). If you are derementing
> the count in uninit(), you are asuming that for every dellink() call,
> there is a corresponding uninit() call. I am not sure if this assumption
> is correct. Perhaps you can shed some more lights on this.
Yes. Look at net/core/dev.c
dellink calls unregister_netdevice_queue.
The active part of unregister_netdevice_queue rollback_registered_many
calls dev->ndo_stop() and then ndo_uninit.
We might still be using rcu hash lookups until ndo_close is called and
so we really don't want to move the decrement before then.
>> My compiled version of macvlan_addr_busy is much smaller than yours so I
>> can't guess based on your disassembly what is wrong. But by reading the
>> code it must either be port->dev->dev_addr or the rcu
>> macvlan_hash_lookup.
>
> Yes, the corruption is in port->dev->dev_addr. The dev_addr seems to get a
> bogus address value.
Interesting if it is port->dev->dev_addr than count is really out of the
picture.
My blind guess would be that port->dev is getting freed and recycled
before dev_addr gets accessed. But macvlan_device_event seems to
prevent that.
>> I might just be dense today but I can't possibly see how moving that
>> decrement would solve the crash you have reported below.
>
> In my tests, I have confirmed that with my change, the crash I reported is
> no longer reproducable with our scripts. I have also verified that when I
> pull out your d5cd92448fded change, I can also no longer reproduce the
> issue. So I believe that the crash is related to the above change.
> However, I am not very familier with the code in the macvlan
> driver, so I can not say for sure that the fix I made genuinely solves the
> problem.
It sounds to me like you have a memory stomp and that recompiling the
code winds up changing what gets stomped.
Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists