×


PXE Boot or DHCP Failure on Guest - Fix it now

Are you facing PXE Boot (or DHCP) Failure on Guest?

This guide will help to resolve it. So read it carefully.


Sometimes, guest virtual machine starts successfully but is then either unable to acquire an IP address from DHCP or boot using the PXE protocol, or both.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers using guest virtual machines to fix similar failure issues.

In this context, we shall look into the common causes for these issues along with the steps to fix this issue.


What triggers PXE Boot (or DHCP) Failure on Guest ?

Below are the common found causes for this issue:

1. Having a long forward delay time set for the bridge

2. When the iptables package and kernel do not support checksum mangling rules.


1. Long forward delay time on the bridge

It is the most common cause of  PXE Boot (or DHCP) Failure on Guest. If the guest network interface is connecting to a bridge device that has Spanning Tree Protocol enabled, as well as a long forward delay set, the bridge will not forward network packets from the guest virtual machine.

Packets are not sent onto the bridge until at least that number of forward delay seconds have elapsed since the guest connected to the bridge.

And this delay allows the bridge time to watch traffic from the interface and determine the MAC addresses behind it, and prevent forwarding loops in the network topology.

If the forward delay is longer than the timeout of the guest’s PXE or DHCP client, the client's operation will fail. As a result, the guest will either fail to boot (in the case of PXE) or fail to acquire an IP address (in the case of DHCP).


2. Iptables package and kernel do not support checksum mangling rules

This message is only a problem if all four of the following conditions are true:

i. The guest is using virtio network devices.

If so, the configuration file will contain model type='virtio'

ii. The host has the vhost-net module loaded.

This is true if ls /dev/vhost-net does not return an empty result.

iii. The guest is attempting to get an IP address from a DHCP server that is running directly on the host.

iv. The iptables version on the host is older than 1.4.10.

Iptables 1.4.10 was the first version to add the libxt_CHECKSUM extension.


This is the case if the following message appears in the libvirtd logs:

warning: Could not add rule to fixup DHCP response checksums on network default
warning: May need to update iptables package and kernel to support CHECKSUM rule.

Unless all of the other three conditions in this list are also true, we can ignore the above warning message.

As it is not an indicator of any other problems.

If these conditions occur, UDP packets being sent from the host to the guest will have uncomputed checksums. 

This makes the host’s UDP packets seem invalid to the guest's network stack thus causing the issue.


How to resolve PXE Boot (or DHCP) Failure on Guest ?

Now we will see how our Support Experts fix these issues for our customers.


1. Fix for Long forward delay time on the bridge

To fix this we will change the forward delay on the bridge to 0 and disable STP on the bridge, or both.

We can only proceed with this step if the bridge is just to connect multiple endpoints to a single network and not used to connect multiple networks.

i. If the guest has interfaces connecting to a libvirt-managed virtual network, we can edit the definition for the network, and restart it.

We can do this with the following command:

# virsh net-edit default

ii. After that we an add the following attributes to the <bridge> element:

<name_of_bridge='virbr0' delay='0' stp='on'/>
XML

iii. If the guest interface is connected to a host bridge that was configured outside of libvirt, we need to change the delay setting.

We can do this by adding or editing the following lines in the /etc/sysconfig/network-scripts/ifcfg-name_of_bridge file to turn STP on with a 0 second delay:

STP=on
DELAY=0

iv. After changing the configuration file, we need to restart the bridge device. We can use the following commands:

/usr/sbin/ifdown name_of_bridge
/usr/sbin/ifup name_of_bridge

If name_of_bridge is not the root bridge in the network, that bridge’s delay will be eventually reset to the delay time configured for the root bridge. 

To prevent this from occurring, disable STP on name_of_bridge.


2. Fix for the iptables package and kernel not supporting checksum mangling rules

To solve this problem, we have to invalidate any of the four points that were mentioned under the causes.

The best solution is to update the host iptables and kernel to iptables-1.4.10 or newer where possible.

Otherwise, the most specific fix is to disable the vhost-net driver for this particular guest.

i. To do this, we can edit the guest configuration with the following command:

# virsh edit name_of_guest

ii. Then change or add a <driver> line to the <interface> section:

<interface type='network'>
<model type='virtio'/>
<driver name='qemu'/>
...
</interface>

iii. Save the changes.

iv. After that, we need to shut down the guest and then restart it.

v. If the issue persists even after doing the above, there will possibly be a conflict between firewalld and the default libvirt network.

For fixing this, we need to stop firewalld with the following command

$service firewalld stop

vi. And then restart libvirt with the following command:

$ service libvirtd restart

In addition to this, we can ensure that the guest acquires an IP address by using the dhclient command as root on the guest. 

Along with this we can also check whether the /etc/sysconfig/network-scripts/ifcfg-network_name file is configured correctly.


[Still facing PXE Boot or DHCP Failure issue? We are available 24*7. ]


Conclusion

This article covers how to fix PXE Boot (or DHCP) Failure on Guest.

Nature of this error:

A guest virtual machine starts successfully, but is then either unable to acquire an IP address from DHCP or boot using the PXE protocol, or both. There are two common causes of this error: having a long forward delay time set for the bridge, and when the iptables package and kernel do not support checksum mangling rules.


Cause of PXE BOOT (OR DHCP) ON GUEST FAILED:

Long forward delay time on bridge.

This is the most common cause of this error. If the guest network interface is connecting to a bridge device that has STP (Spanning Tree Protocol) enabled, as well as a long forward delay set, the bridge will not forward network packets from the guest virtual machine onto the bridge until at least that number of forward delay seconds have elapsed since the guest connected to the bridge. This delay allows the bridge time to watch traffic from the interface and determine the MAC addresses behind it, and prevent forwarding loops in the network topology. If the forward delay is longer than the timeout of the guest's PXE or DHCP client, then the client's operation will fail, and the guest will either fail to boot (in the case of PXE) or fail to acquire an IP address (in the case of DHCP).


Fix to PXE BOOT (OR DHCP) ON GUEST FAILED:

If this is the case, change the forward delay on the bridge to 0, or disable STP on the bridge.

This solution applies only if the bridge is not used to connect multiple networks, but just to connect multiple endpoints to a single network (the most common use case for bridges used by libvirt).


If the guest has interfaces connecting to a libvirt-managed virtual network, edit the definition for the network, and restart it. 

For example, edit the default network with the following command:

# virsh net-edit default

Add the following attributes to the <bridge> element:

<name_of_bridge='virbr0' delay='0' stp='on'/>

XML


If this problem is still not resolved, the issue may be due to a conflict between firewalld and the default libvirt network.

To fix this, stop firewalld with the service firewalld stop command, then restart libvirt with the service libvirtd restart command.