Friday, 29 November 2013

Missing VM NIC

The Disappearing Act

A VM went off the network, and actually lost the NIC from within the VM's hardware.

Pouring through logs, (some thanks to LogInsight, more on that later), I discovered in vmware-xx.log:
2013-11-19T07:33:01.246Z| vcpu-0| Powering off Ethernet0
2013-11-19T07:33:01.246Z| vcpu-0| Hot removal done.
ah ha! This shows Ethernet0 was removed via the "Safely Remove Hardware" icon in the Windows system tray.

The solution is to add a new NIC of the same type.

This is something that happens quite a bit as you can read here:

A good idea to prevent this would be to disable HotPlug as mentioned in this KB artile:

LogInsight was able to show some hints, but didn't quite nail it. LogInsight showed:
2013-11-19T07:33:01.841Z esxhost.local Vpxa: [FFE6FB90 verbose 'Default' opID=WFU-e9559ea6] [VpxaHalVmHostagent] 2746: Config changed 'config.extraConfig["ethernet0.pciSlotNumber"].value'

Unfortunately, LogInsight doesn't index the virtual machine logs (vmware.log). If it did, it would have saved me a lot of time.

William Lam has done a great post on how to get your vmware.log into the ESXi syslog,

Search and Destroy

From here I needed to find out who the culprit was. Unfortunately there's no central syslog for our Windows servers, so I had to dig through the event logs. The Security logs are pretty noisey, but I was able to see some users were logged around the time of the incident. No one admitted to accidentally ejecting the hardware, and it wouldn't have mattered if we did identify the individual. Everyone was now aware of it, and knew to be more careful.

Friday, 15 November 2013

Updating powershell help

When installing Powershell v3, it doesn't install any local help files for get-help. If you are upgrading from v2 to v3, the help isn't updated either.

If you are using a proxy, a simple update-help may not work for you.

Do the following:

$webclient = New-Object System.Net.WebClient
$creds = Get-Credential
$webclient.Proxy.Credentials = $creds

This will connect to the internet and download the updated help files.

Tuesday, 12 November 2013

Constant Alarm 'Network Uplink Redundancy Lost'

It's amazing how much is going on when you dig through logs. On this occasion I was looking at  "tasks & events" of a host and noticed a lot of network errors.

Alarm 'Network uplink redundancy lost' on <servername> triggered an action

The error was occurring every 5 minutes. This was made visual with the use of Log Insight. My new favourite tool.

Alarm 'Network uplink redundancy lost': an SNMP trap for entity <servername> was sent
appname  source  hostname  vc_event_type  vc_alarm_type 
Alarm 'Network uplink redundancy lost' on <servername> triggered an action
appname  source  hostname  vc_event_type  vc_alarm_type

I couldn't find anything wrong with this particular ESXi host, vSwitch or uplink. It had the same configuration as all the other hosts in the cluster.

The fix was to go to the top level where the alarm is defined, Edit Settings, disable the alarm, then go back and re-enable it.

After that, the errors stopped appearing.

Reconfigured alarm 'Network uplink redundancy lost' on Datacenters
appname vc_username source hostname vc_event_type vc_details 
Reconfigured alarm 'Network uplink redundancy lost' on Datacenters
appname vc_username source hostname vc_event_type vc_details