IPv4 Fragmentation and Reassembly

IPv4 Fragmentation and Reassembly is an important feature of today’s networks. To understand how it works consider the following back-to-back network. R1 and R2 are connected on an Ethernet segment.

R1 and R2

The configuration of the interfaces is as follows:

R1#show run int gigabitEthernet 0/0
!
interface GigabitEthernet0/0
ip address 1.2.1.1 255.255.255.248
end

R2#show run int gigabitEthernet 0/0
!
interface GigabitEthernet0/0
ip address 1.2.1.2 255.255.255.248
end

If we look at the MTU (Maximum Transmission Unit) set on the interfaces we see the following.

R1#show interfaces gigabitEthernet 0/0 | i MTU
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

R2#show interfaces gigabitEthernet 0/0 | i MTU
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

This shows the maximum possible IP MTU the interface will allow. Note this is not the Ethernet frame size. We can see the current value of the IP MTU using the show ip interface command

R1#show ip interface gigabitEthernet 0/0 | i MTU
MTU is 1500 bytes

R2#show ip interface gigabitEthernet 0/0 | i MTU
MTU is 1500 bytes

By default the IP MTU is set to the maximum possible value, but can be lowered if required:

R1#conf t
R1(config)#int gigabitEthernet 0/0
R1(config-if)#ip mtu ?
<68-1500> MTU (bytes)

If the IP MTU is 1500 what is the total size of the Ethernet frame? To understand this we need to look at the Ethernet Frame structure.

small ethernet frame

mtu max

We can see the Ethernet frame comprises of a MAC destination address and MAC source address (6 bytes each). Then follows an optional 4 bytes reserved for VLAN trunking. Next is the Ethertype/Length field (2 bytes), followed by the payload (upto 1500 bytes). Finally, there is the Frame Check Sequence/Cyclic Redundancy Check (4 bytes). In total, this comes to a frame size of 1522 bytes.

Going back to our original two routers, we can see that as long as our IP packets doesn’t exceed 1500 then there should be no fragmentation. To illustrate this we can ping between the routers with a packet size of 1500.

R1#ping 1.2.1.2 size 1500 repeat 1
Sending 1, 1500-byte ICMP Echos to 1.2.1.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 12/12/12 ms

In Wireshark, we can see that the length of the ICMP packet is 1514. But why is this the case?

icmp normal

Looking at the breakdown of the frame we can see the following:

ICMP Frame breakdown.PNG

In this case, there is no VLAN trunking configured, and Wireshark does not show the FCS/CRC (4 bytes), so this why we see the Length as 1514.

Looking at the first packet in more detail, below, we see the total IP packet length is 1500. This contains a Header Length of 20 bytes, and ICMP header of 8 bytes (not shown) and an ICMP data payload of 1472 bytes (1500 -20 -8)

icmp normal detail

So far so good. Now, what happens when we exceed the IP MTU by pinging with a size of 1600 bytes?

R1#ping 1.2.1.2 size 1600 repeat 1
Type escape sequence to abort.
Sending 1, 1600-byte ICMP Echos to 1.2.1.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 56/56/56 ms

Looking at Wireshark we can see the packet has fragmented. Now we see an IPv4 packet with a length of 1514 and an ICMP packet with a length of 134. Note that these lengths are again the entire frame length. Accounting for the frame overhead, this equates to a payload length of 1500 and 120 bytes respectively.

Huh? How? 1514 is the entire frame length (not including the 4 bytes for the FCS/CRC). We, therefore, remove the 6 bytes for the MAC destination address and 6 bytes for the MAC source address, and finally 2 bytes for the Frame Type field. There is no VLAN tagging, so the optional 4 bytes for the VLAN tagging were not added. We therefore have 1514 – 6 -6 -2 = 1500.

Similarly the ICMP packet is 134 -6 -6 -2 = 120

icmp packet 1600 bytes - fragmented

Let’s examine this is more detail:

Frame number 2 – The frame has a length of 1514 bytes, but the embedded IPv4 packet has a length of 1500 bytes. Note that the Flags are set to 0x01 indicating there are more fragments to come. The Fragment offset is set to 0, as this is the first packet in the sequence. Given there is no ICMP header the total payload is 1480 (1500 – the IP header of 20)

icmp packet 1600 bytes - fragmented- frame 1

Frame number 3 – The frame has a length of 132 bytes, but the embedded IPv4 packet has a length of 120 bytes. Note that the Flags are set to 0x00 indicating there are no more fragments. The fragment offset is set to 1480 indicating this fragment starts after the first 1480 bytes of the first fragment. As this is the last fragment, the ICMP packet is reassembled and a total data output of 1572 is displayed i.e. (1600 – 20 bytes for the IP header, and 8 bytes for the ICMP header).

icmp packet 1600 bytes - fragmented- frame 2

Let’s try this again, but with a much larger payload:

R1#ping 1.2.1.2 size 9000 repeat 1
Sending 1, 9000-byte ICMP Echos to 1.2.1.2, timeout is 2 seconds:
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 20/20/20 ms

Now, let’s look at this in Wireshark:

icmp longer example

Here you can see the packet has fragmented multiple times, notice the frame length of 1514, and the fragment offset is increasing. Note that within the Flags field this is represented by the number 185  – This is simply the offset (1480) divided by eight.

For example, if we look at Flags field in frame 371, we will see it displayed this way.

icmp flags field

Also, notice the “Don’t fragment” flag is not set. If this were, the packet would not fragment and instead would be dropped.

So why not set the interface MTU above 1500 to avoid fragmentation? Well, there are a few reasons. The main one is not all hardware supports a higher than 1500 IP MTU, especially legacy hardware. Also, it is important that each side of an Ethernet segment matches in terms of MTU. For this reason, ISPs frequently standardise on the default of 1500 bytes for Internet connections. However, point-to-point WAN connections controlled by carriers will normally have their MTU set high. This enables customers to run MPLS and SD-WAN deployments without issue.