The Disappearing Act
A VM went off the network, and actually lost the NIC from within the VM’s hardware.
Pouring through logs, (some thanks to LogInsight, more on that later), I discovered in vmware-xx.log:
2013-11-19T07:33:01.246Z| vcpu-0| Powering off Ethernet0
2013-11-19T07:33:01.246Z| vcpu-0| Hot removal done.
ah ha! This shows Ethernet0 was removed via the “Safely Remove Hardware” icon in the Windows system tray.
The solution is to add a new NIC of the same type.
This is something that happens quite a bit as you can read here: http://blogs.vmware.com/kb/2010/06/nic-is-missing-in-my-virtual-machine.html
A good idea to prevent this would be to disable HotPlug as mentioned in this KB artile: http://kb.vmware.com/kb/1012225
LogInsight was able to show some hints, but didn’t quite nail it. LogInsight showed:
2013-11-19T07:33:01.841Z esxhost.local Vpxa: [FFE6FB90 verbose ‘Default’ opID=WFU-e9559ea6] [VpxaHalVmHostagent] 2746: Config changed ‘config.extraConfig[“ethernet0.pciSlotNumber”].value’
Unfortunately, LogInsight doesn’t index the virtual machine logs (vmware.log). If it did, it would have saved me a lot of time.
William Lam has done a great post on how to get your vmware.log into the ESXi syslog, http://www.virtuallyghetto.com/2013/07/a-hidden-vsphere-51-gem-forwarding.html
Search and Destroy
From here I needed to find out who the culprit was. Unfortunately there’s no central syslog for our Windows servers, so I had to dig through the event logs. The Security logs are pretty noisey, but I was able to see some users were logged around the time of the incident. No one admitted to accidentally ejecting the hardware, and it wouldn’t have mattered if we did identify the individual. Everyone was now aware of it, and knew to be more careful.