How to Fix Common VM Issues at CloudPap


At CloudPap, we understand that operating virtual machines (VMs) can come with its own set of challenges. While we strive to provide a seamless and robust VM hosting experience, it’s not uncommon to encounter issues that can disrupt your services. In this article, we’ll address some of the common VM issues our clients face and provide detailed solutions to get you back on track.

1. Network Issues #

Cause: Network issues result to inaccessibility of the VM and this can occur due to various reasons, including misconfigurations, connectivity problems, or network congestion.

Possible Error Message: You may encounter messages like “No network connection” or “Network unreachable.” or “Hmm. We’re having trouble finding that site.”

Steps to Solution:

Network Issues may result from various issues. Check out this guide on a step by step process of resolving network related issues.

2. Disk Issues #

Disk Full #

Cause: A disk full issue occurs when the storage on your VM reaches its capacity limit. This can be either the space or number of files (inodes) of your disk. Disk full may dangerous especially for servers running databases such as MySQL as it can result to server crash.

Possible Error Message: You might receive warnings like “No space left on device.”

Steps to Solution:

  1. Check disk usage: Use the df -h command to check the disk space usage on your VM. Identify which directories are consuming the most space. Use df -i to check which inode usage.
  2. Remove unnecessary files: Delete or archive files that are no longer needed to free up disk space.
  3. Resize the disk: If your VM’s disk is provisioned on a cloud platform, you can usually resize the disk to accommodate more data.
  4. Monitor disk usage: Implement disk space monitoring to receive alerts before the disk becomes full in the future.

Filesystem Errors #

Cause: Filesystem errors can result from data corruption, filesystem inconsistencies, or unexpected interruptions during disk operations.

Possible Error Message: You might encounter errors like “Filesystem is not clean” or “Inode table corrupted.” Also, when a server with severe filesystem errors is rebooted, it may fail to come up until disk is repaired.

Steps to Solution:

  1. Check filesystem integrity: Use filesystem-checking tools like fsck to scan and repair filesystem errors.
  2. Backup critical data: Before running any repair commands, make sure to create a backup of critical data to prevent data loss.
  3. Run fsck: Execute fsck with the appropriate flags to repair the filesystem. A sample command to use is fsck -fy /dev/vda1/

3. Kernel Panic #

Cause: Kernel panic is a critical error in the VM’s operating system kernel. It can occur due to various reasons, including faulty hardware, incompatible drivers, or misconfigurations.

Possible Error Message: When a kernel panic occurs, you’ll see a screen with diagnostic information and error messages. These messages can vary widely. Sometimes, when a Kernel gets upgraded, the latest kernel may fail to boot after server is rebooted.

Steps to Solution:

  1. Identify the cause: Analyze the error messages to determine the root cause. It might be related to hardware issues, driver problems, or misconfigurations.
  2. Update drivers: Ensure that your VM’s drivers and kernel are up to date.
  3. Reconfigure the kernel: If misconfigurations are identified, make necessary changes in the kernel configuration. This may happen if you had tried to change the Kernel settings from default.

4. MySQL Database Down #

Cause: MySQL database downtime can result from various factors, including server resource limitations, misconfigurations, or query optimization problems or even disk being full.

Possible Error Message: MySQL errors typically display messages like “Can’t connect to MySQL server” or “Table is marked as crashed.” among other errors. There are a vast number of errors, fortunately, most a very descriptive of the issue.

Steps to Solution:

  1. Check MySQL status: Use the systemctl command to check the MySQL service status. Restart the service if necessary.
  2. Review configuration: Inspect the MySQL configuration files to ensure they are correctly set up.
  3. Optimize queries: Poorly optimized queries can put excessive strain on the database. Identify and optimize slow queries.
  4. Monitor resource usage: Check the server’s resource usage and consider upgrading if your database requires more resources.
  5. Backup and restore: In some cases, you may need to restore from a backup if data corruption has occurred.

5. Security Issues #

Server Hacked #

Cause: Security breaches can happen when vulnerabilities are exploited, passwords are compromised, or malicious code is injected.

Possible Error Message: There may not be a specific error message, but you may observe unusual behavior, unauthorized access, or defacement of your website or server.

Steps to Solution:

  1. Isolate the server: Immediately disconnect the compromised server from the network to prevent further damage.
  2. Change passwords: Reset all passwords, including the server’s root password, database passwords, and user passwords.
  3. Remove malicious code: Scan the server for any malicious code or backdoors and remove them.
  4. Patch vulnerabilities: Update all software and applications to their latest versions and apply security patches.
  5. Improve security measures: Strengthen your server’s security by implementing firewalls, intrusion detection systems, and access controls.
  6. If you root or sudo access to the server is compromised, the recommended solution is to backup data eg Databases and reinstall the server then upload files from safe backups. Trying to fix a hacked server with root access lost is a tall order and you have no way to ensure backdoors were not installed in the server

6. Server High Load #

Cause: High server load can result from resource-hungry processes, insufficient resources, or misconfigurations. Sometimes, this may simply indicate your website traffic is growing and thus a good sign

Possible Error Message: You may not receive a specific error message, but you’ll notice a slow or unresponsive server or websites.

Steps to Solution:

  1. Identify resource hogs: Use tools like top or htop to identify processes consuming excessive resources.
  2. Optimize applications: Optimize your applications and services to reduce resource usage.
  3. Scale resources: Consider upgrading your VM with more CPU, RAM, or disk space to handle increased load.
  4. Use Cloudflare for DNS to block out some bad bots

What are your feelings
Updated on October 24, 2023