Are you facing PXE Boot (or DHCP) Failure on Guest?
This guide will help to resolve it. So read it carefully.
Sometimes, guest virtual machine starts successfully but is then either unable to acquire an IP address from DHCP or boot using the PXE protocol, or both.
In this context, we shall look into the common causes for these issues along with the steps to fix this issue.
What triggers PXE Boot (or DHCP) Failure on Guest ?
Below are the common found causes for this issue:
1. Having a long forward delay time set for the bridge
2. When the iptables package and kernel do not support checksum mangling rules.
1. Long forward delay time on the bridge
It is the most common cause of PXE Boot (or DHCP) Failure on Guest. If the guest network interface is connecting to a bridge device that has Spanning Tree Protocol enabled, as well as a long forward delay set, the bridge will not forward network packets from the guest virtual machine.
Packets are not sent onto the bridge until at least that number of forward delay seconds have elapsed since the guest connected to the bridge.
And this delay allows the bridge time to watch traffic from the interface and determine the MAC addresses behind it, and prevent forwarding loops in the network topology.
If the forward delay is longer than the timeout of the guest’s PXE or DHCP client, the client's operation will fail. As a result, the guest will either fail to boot (in the case of PXE) or fail to acquire an IP address (in the case of DHCP).
2. Iptables package and kernel do not support checksum mangling rules
This message is only a problem if all four of the following conditions are true:
i. The guest is using virtio network devices.
If so, the configuration file will contain model type='virtio'
ii. The host has the vhost-net module loaded.
This is true if ls /dev/vhost-net does not return an empty result.
iii. The guest is attempting to get an IP address from a DHCP server that is running directly on the host.
iv. The iptables version on the host is older than 1.4.10.
Iptables 1.4.10 was the first version to add the libxt_CHECKSUM extension.
This is the case if the following message appears in the libvirtd logs:
warning: Could not add rule to fixup DHCP response checksums on network default
warning: May need to update iptables package and kernel to support CHECKSUM rule.
Unless all of the other three conditions in this list are also true, we can ignore the above warning message.
As it is not an indicator of any other problems.
If these conditions occur, UDP packets being sent from the host to the guest will have uncomputed checksums.
This makes the host’s UDP packets seem invalid to the guest's network stack thus causing the issue.
How to resolve PXE Boot (or DHCP) Failure on Guest ?
Now we will see how our Support Experts fix these issues for our customers.
1. Fix for Long forward delay time on the bridge
To fix this we will change the forward delay on the bridge to 0 and disable STP on the bridge, or both.
We can only proceed with this step if the bridge is just to connect multiple endpoints to a single network and not used to connect multiple networks.
i. If the guest has interfaces connecting to a libvirt-managed virtual network, we can edit the definition for the network, and restart it.
We can do this with the following command:
# virsh net-edit default
ii. After that we an add the following attributes to the <bridge> element:
<name_of_bridge='virbr0' delay='0' stp='on'/>
iii. If the guest interface is connected to a host bridge that was configured outside of libvirt, we need to change the delay setting.
We can do this by adding or editing the following lines in the /etc/sysconfig/network-scripts/ifcfg-name_of_bridge file to turn STP on with a 0 second delay:
iv. After changing the configuration file, we need to restart the bridge device. We can use the following commands:
If name_of_bridge is not the root bridge in the network, that bridge’s delay will be eventually reset to the delay time configured for the root bridge.
To prevent this from occurring, disable STP on name_of_bridge.
2. Fix for the iptables package and kernel not supporting checksum mangling rules
To solve this problem, we have to invalidate any of the four points that were mentioned under the causes.
The best solution is to update the host iptables and kernel to iptables-1.4.10 or newer where possible.
Otherwise, the most specific fix is to disable the vhost-net driver for this particular guest.
i. To do this, we can edit the guest configuration with the following command:
# virsh edit name_of_guest
ii. Then change or add a <driver> line to the <interface> section:
iii. Save the changes.
iv. After that, we need to shut down the guest and then restart it.
v. If the issue persists even after doing the above, there will possibly be a conflict between firewalld and the default libvirt network.
For fixing this, we need to stop firewalld with the following command
$service firewalld stop
vi. And then restart libvirt with the following command:
$ service libvirtd restart
In addition to this, we can ensure that the guest acquires an IP address by using the dhclient command as root on the guest.
Along with this we can also check whether the /etc/sysconfig/network-scripts/ifcfg-network_name file is configured correctly.