Skip to main content

VMware nightmares

After a great couple of days away I was called urgently to work - one of my client's networks was down and my colleague was stuck. The VMsphere client couldn't see the hosts, which couldn't see their datastores and the network had zero stability. Yay. Just wanted I wanted to come home to.

The servers are running ESXi 4.1 that needs a few updates but the networks stability has always been a real issue for us. We took it over from some other chaps, solving a long series of issues with the servers and particularly the IOmega NAS. Throw in a database issue on one of the guest servers that kept the disk IO at absolute peak all the time and things were pretty tricky. All those things have been resolved, more or less, and the network had been fully functional for some months. So why did it change? Several reasons and I hope you take away from this some ideas for yourself.

The NAS appeared to have dropped a disk and while it has a RAID10 set up, it paralysed the system (not at all like the FreeNAS I've talked about before). The whole thing fell over and we shut it down before reseating the disk and powering back up. The NAS detected the disk and began a rebuild. The VMware hosts couldn't see the datastore though and the server managing the virtual environment periodically disconnected from everything. Initially we thought it was a network issue and restarted everything. The hosts were slow to come up, with errors in the logs indicating an inability to see the iSCSI disks. We could connect to the NAS via it's web interface, ping it and it looked quite happy on the network so we couldn't understand what was happening. An added complication was we have a new HP NAS on the network and while we were able to migrate several of the hosts to it, we've had problems getting then started. Don't know why, but the host's cpu goes through the roof everytime we try to start them. I thought we might have an all paths down bug and most of the documentation suggests a call to VMware and let them sort it out. At 8pm at night this isn't so great a plan, and with the client losing production time and money we had to solve it.

So with all these errors and problems left and right I was at a loss. Eventually we took inspiration from The IT Crowd and turned it all off, counted to 10 and started it back up. Would you believe that it took 3 reboots of the management server before the VMsphere client would connect to anything - especially considering it was running locally! The iSCSI shares from the NAS became available finally - there was a service issue on the iOmega NAS that was failing silently. It was alleging that all was good and it really wasn't. Now those shares were available we were able to reconnect to the datastores and boot the guest machines. The management server was still disconnecting from the hosts constantly but we were able to at least get things going. There is still quite a bit to do there but the servers are finally running.

Going forward I think we'll use XenServer and NFS shares. Simpler, fully functional and quick. Easy to backup and expand disks. Adios VMware I reckon.

The final thought is "Have you turned it off and back on again?" Something in that for all of us :-)

Comments

Popular posts from this blog

Windows 10 Enterprise Eval - gotchas

After an annoying turn of events where my Windows 10 Enterprise USB drive failed, attempts to install Win10 onto a computer failed miserably. I turned to the net and managed to get my hands on Microsoft's Windows 10 Enterprise Evaluation. I have an enterprise key so I thought - cool! Here's the opportunity to get it going and to then upgrade the license later. Full install, patched etc and all is swell. Except when I try to upgrade. I straight up tried changing the licence key only to get a variety of errors, most of which are pertaining to the activation system being unavailable. The I try this: https://winaero.com/blog/upgrade-windows-10-evaluation-to-full-version-easily/ but it doesn't work either. Next I'll try this: h ttp://www.edugeek.net/forums/windows-10/174594-upgrading-windows-10-enterprise-90-evaluation-full.html And if all else fails, in goes the bootable USB I've now created. If only I'd had this in the first instance I would not be writing t

Plone - the open source Content Management System - a review

One of my clients, a non-profit, has a lot of files on it's clients. They need a way to digitally store these files, securely and with availability for certain people. They also need these files to expire and be deleted after a given length of time - usually about 7 years. These were the parameters I was given to search for a Document Management System (DMS) or more commonly a Content Management System (CMS). There are quite a lot of them, but most are designed for front facing information delivery - that is, to write something, put it up for review, have it reviewed and then published. We do not want this data published ever - and some CMS's make that a bit tricky to manage. So at the end of the day, I looked into several CMS systems that looked like they could be useful. The first one to be reviewed was OpenKM ( www.openkm.com ). It looked OK, was open source which is preferable and seemed to have solid security and publishing options. Backing up the database and upgradin

Fixing a black screen after doing a Kali Linux update

Kali Linux is a rolling Linux distribution designed for security and penetration work. You can find details on it here: www.kali.org . We run this excellent product for a range of different security work and it's been great. I built the image in VMplayer, then shared it to the team and we've all been at it since. A recent update broke it though - black screen, no network and completely unresponsive. There are lots of posts about similar things - mostly to do with graphics adaptors, however, we found that executing the following at a root prompt fixed it. But how to get to the root prompt from a blank screen? Linux has a number of terminals available to the user - most of us use the graphical one to do our day to day, but you can access a command line prompt without much trouble. Simply hold CTRL-ALT and then F2 or F3 down at the same time and it drops you to a command line login. BOOM. Time to fix it up. For me, and for the other fellas in the team, all it too was to