ESXi 5 Unresponsive VM – How to Power Off

Post source — vladan

Sometimes your VM gets stuck and you have no possibility to do graceful shutdown or power off through the vSphere client. Saying it differently, the VM is stuck and the only way is to reboot the whole ESXi host, which you certainly don’t want to do.

But the unresponsive VM running there can be “killed” through the CLI, yes. This can be done several ways, but I’ll show you a way of doing it through putty SSH session.

First thing to do is to enable SSH on your ESXi Box (select host > configuration > Security profile > Properties > SSh)

We will be using ESXTOP command.

There are different ways to stop ( “kill” ) a VM by using the vCLI, PowerCLI or the console session. In ESXi 5 it is possible to kill a running VM, the process of the VM concerned, by using the esxtop command.

ESXi 5 Unresponsive VM – How-to Power Off

Step 1 – connect via SSH by using puty for example and enter esxtop.

Enter “esxtop”, then press “c” for the CPU resource screen and shift + V to display VMs only.

ESXi 5 Unresponsive VMESXi 5 Unresponsive VM

Step 2 – changing the display and locating the LWID number

Press “f” to change the display fields and press “c” in order to show the LWID (Leader World Id) and press ENTER.

How to kill unresponsive VM in VMware ESXi 5How to kill unresponsive VM in VMware ESXi 5

Step 3 – Invoking the k (kill) with the number does it…..

Now when you have the LWID column there, you can see the VM which interests you by the LWID number.

You can press “k” and enter the LWID number of the VM which you want to stop. Note that this is hard stop so, the next time that the VM will boot you’ll probbably see this screen (depending on your guest OS of course).

VMware ESXi 5 - How to kill an unresponsive VM through command lineVMware ESXi 5 - How to kill an unresponsive VM through command line

If this method don’t work, you can’t vmotion the VM elsewhere or any other option don’t work either, there might be a hardware problem with the host which can lead into PSOD.

There is also this KB which discuses other methods including the one described here – ESXi 5 Unresponsive VM – VMware Kb. Also you might want to check Using hardware NMI facilities to troubleshoot unresponsive hosts (1014767)

Advertisements

Does disabling Inter-VM TPS affect your environment

Does disabling Inter-VM TPS affect your environment

Rate This

Before starting with this post I would like to recall what is TPS and what it does in a virtualized environment.

Transparent page sharing is a method by which redundant copies of pages are eliminated. This helps to free memory that a virtual machine would otherwise be using. Because of the way TPS works with hardware-assisted memory virtualization systems like Intel EPT Hardware Assist and AMD RVI Hardware Assist, esxtop may show zero or few shared pages in these systems. Page sharing will show up in esxtop only when host memory is overcommitted.

The below diagram will show you an overview of what TPS is doing

TPS

Earlier TPS was enabled by default to save memory space by sharing the same memory block between the VM’s running the similar type of applications. In VDI environment where most of the VM are running with same OS and contains similar applications, TPS was saving a hell lot of space and making the memory management technique more efficient.

But sometimes back a research brought the darker side of using the TPS technique where it is demonstrated that TPS is a serious security threat. The research indicated that TPS can be abused to gain unauthorized access to data under certain highly controlled conditions.

Even though VMware believes information being disclosed in real world conditions is unrealistic, out of an abundance of caution upcoming ESXi Update releases will no longer enable TPS between Virtual Machines by default.

Starting with update releases in December, 2014, default setting for TPS will be Disabled and will continued to be disabled for all future versions of vSphere.

VMware has mentioned and addressed this issue in their KB Article 2080735. I would recommend to have a look into that. Also the KB article 2091682 “Additional Transparent Page Sharing management capabilities in ESXi 5.5, 5.1, and 5.0 patches in Q4, 2014″ has explained how to overcome this issue for existing version of vSphere.

Now the question is “How disabling Inter-VM TPS impact your environment?”

If your environment is not using the concept of “Large Pages” and using the default 4kb page size then TPS will be handy to save some memory blocks for you especially when the environment consists of VM’s with same type of OS and running similar application for e.g. VDI.

But if your environment which consists of modern processors with MMU support (Intel EPT/AMD RVI Hardware Assist) then your ESXi host is leveraging the use of Large Pages (2MB) and by default there won’t be huge impact as TPS is not effective there (because the chances of 2 pages to be exactly similar will be very less). Also TPS is not called until host comes under memory pressure and pages are broken down to 4KB.

This is explained very well in VMware KB 1021095

Additionally modern operating systems like Windows 2008/2012 or Linux are leveraging security feature called Address Space Layout Randomization (ASLR), which is preventing TPS to be effective, especially in ZERO’ing when large pages are used.

TPS will be very advantageous in following 2 scenarios:

  • If you are having VDI deployments especially with floating (not dedicated) desktops into your environment and when the Esxi hosts are under memory pressure.
  • Also if you have older operating systems prior to Windows Server 2008, then your operating systems will be using small pages 4KB by default and hence they will be benefited from TPS a lot.
  • Also if you have disabled  the use of the large pages by using advance value Mem.AllocGuestLargePage 0, then in this case you should be benefiting from TPS even in case of “modern” operating systems.

If you want to check how much you are utilizing TPS then check performance tab in your vSphere client (on cluster or host level, but can be done on VM level too). In memory view look for SHARED and ZERO metrics.

SHARED is all memory which is saved with TPS and ZERO is memory with zeroed blocks collapsed into one. If you subtract ZERO from SHARED you will get an actual estimate of your savings from deduplication in general.

Note: Unfortunately there is now way to check how much savings you have from Inter-VM TPS

What to do if you dont want to wait for patches for disabling TPS?

Although VMware believes that the reported possible information disclosure in TPS can only be abused in very limited configuration scenarios, VMware advices customers who are sufficiently concerned about this possibility to proactively disable TPS on their ESXi hosts. Customers do not have to wait for the either the Patch or the Update releases to do this.

For environments using ESXi 5.x, perform the following steps:

  1. Login to ESXi or vCenter Server using the vSphere Client.
  2. Select the relevant ESXi host.
  3. In the Configuration tab, click Advanced Settings under the software section.
  4. In the Advanced Settings window, click Mem.
  5. Look for Mem.ShareScanGHz and set the value to 0.
  6. Click OK.

Perform one of the following to make the TPS changes effective immediately:

  • Migrate all the virtual machines to other host in cluster and back to original host.
  • Shutdown and power-on the virtual machines.

What if you want to continue using TPS after the Patch/Update?

A couple of new Advanced Configuration options are introduced by the Patch and is explained in KB2091682

  • Mem.ShareForceSalting: This is a host-level configuration option. This is what disables/enables TPS on an ESXi host. If this is set to “0”, it means that TPS is STILL enabled on the host. If set to “1”, it means that TPS has been disabled on the Host, and salting is required in order for TPS to work on any VM located on that host.
  • sched.mem.pshare.salt: This value enables customers to selectively enable page sharing between/among specific VMs. When ShareForceSalting is set to “1” on an ESXi host, the only way for two or more VMs to share a page is for both their salt and the content of the page to be same. The salt is the value specified by customers for this per-VM Advanced Configuration option. This value must be identical on all the VMs that you intend to enable page sharing for.
  1. Log in to ESXi or vCenter with the VI-Client.
  2. Select the ESXi relevant host.
  3. In the Configuration tab, click Advanced Settings under the software section.
  4. In the Advanced Settings window, click Mem.
  5. Look for ‘Mem.ShareForceSalting’ and set the value to 1.
  6. Click OK.
  7. Power off the VM, which you want to set salt value.
  8. Right click on VM, click on Edit settings.
  9. Select options menu, click on General under Advanced section
  10. Click on Configuration Parameter
  11. Click on Add Row, new row will be added.
  12. On left hand side add text ‘sched.mem.pshare.salt’ and on the right hand side specify the unique string.
  13. Power on the VM so that salting can take effect.
  14. Repeat steps 7 to 13 to set the salt value for individual VMs.
  15. Same salting values can be specified to achieve the page sharing across VMs.

IF ShareForceSalting is set to “1” and the sched.mem.pshare.salt is not set on a VM, the VM’s vc.uuid will be substituted for the salt value instead. Because the vc.uuid is unique to a VM, that VM will only be able to share page with itself – effectively, no sharing for this VM.

For those who wants to dive more deeper into this I would recommend reading this  whitepaper: https://eprint.iacr.org/2014/435.pdf