Using Ethernet FMC without a processor
Why use the Ethernet FMC without a processor?
In some applications, the Ethernet FMC is used without a processor in the FPGA design. By “processor”, we’re referring to both hard integrated processors such as the ARM Cortex in the Zynq, and the soft processors such as the MicroBlaze. Why would you want to do that? Well, some low-latency applications only need to operate on a low-level, processing Ethernet packets in the FPGA fabric, and they don’t need to dedicate extra resources to a processor. But doing away with the processor comes with sacrificing flexibility, as configuring your hardware devices (eg. PHY, MAC) becomes much more complicated and in most cases you are forced to accept the hardware defaults. This technical note discusses the issues that can arise when using the hardware defaults of the Ethernet FMC and the relevant Xilinx IP cores.
It’s not impossible to configure the PHYs and the MACs without a processor, but it’s definitely more difficult, as you would have to design complex state machines to communicate the required configurations over the relevant buses (MDIO, AXI and others).
For most applications, the Marvell 88E1510 PHYs used by the Ethernet FMC, have a default configuration that is quite suitable. For example, they are configured by default to use autonegotiation, to advertise 10Mbps, 100Mbps and 1Gbps capability, and to enable MDI automatic crossover. However there is one particular setting that requires special attention in your FPGA design. The “RGMII TX clock internal delay” setting is enabled by default and must be accounted for in your Vivado design to allow the RGMII transmit interface to function correctly.
A problem can occur if you are using the default PHY configuration AND your Vivado project is designed to output a delayed RGMII TX clock. In this case, you are delaying the clock twice (once in the FPGA, and again in the PHY), which makes it poorly aligned for sampling the data. Note that this problem will only be noticable when operating at 1Gbps, as the PHY’s internal delay is 1.9ns regardless of link speed, which is not enough of a misalignment to cause poor sampling at 10Mbps or 100Mbps.
The solution is to change your Vivado project so that the RGMII TX clock is output in phase with the RGMII TX data. How to do this depends on what IP is implementing the RGMII TX interface in your design.
The Marvell 88E1510 Ethernet PHYs were designed with a selectable internal delay which when enabled delays the incoming RGMII TX clock by 1.9ns. This feature simplifies the design of the RGMII interface in the FPGA, by allowing the RGMII TX data and clock to be output with the same phase.
Solution for GMII-to-RGMII IP
The GMII-to-RGMII IP has an option to specify whether or not the RGMII TX clock delay will be performed in the FPGA or in the PHY. You must select “Skew added by PHY” if you are using the default PHY configuration.
Solution for AXI Ethernet Subsystem IP
The AXI Ethernet Subsystem IP does not have an option to specify where the RGMII TX clock delay will be performed. By default (ie. when the block automation feature is used), the AXI Ethernet Subsystem is configured to delay the RGMII TX clock in the FPGA fabric. The way it does this is by using an MMCM to produce a clock that is phase shifted by 90 degrees with respect to the “gtx_clk”. This clock is labeled “gtx_clk90” and it is used to clock the ODDR that “forwards” the RGMII TX clock (see Clock Forwarding in the SelectIO user guide for more information).
The simple way to remove the delay is to change the configuration of the MMCM to produce a phase shift of zero (0) rather than 90. This can be easily done by adding the following command to your Vivado project’s XDC constraints file:
set_property CLKOUT1_PHASE 0 [get_cells design_1_i/axi_ethernet_0/U0/eth_mac/U0/tri_mode_ethernet_mac_support_clocking_i/mmcm_adv_inst]
This assumes that your design contains “axi_ethernet_0” which contains the shared logic (MMCM). It also assumes that your Vivado project is configured for VHDL. If this is not the case for your design, you will have to search for the MMCM instance in your synthesised design and replace the path with the correct one.
If you are using multiple AXI Ethernet Subsystem IPs, usually only one of them will contain the shared logic (MMCM) and the phase shifted clock is input to the others. In this case you still have only one MMCM to reconfigure and so this solution is still valid. If you have the shared logic (MMCM) included in more than one AXI Ethernet Subsystem IP, then you must apply the above command for each of the MMCMs.
Solution for Custom IP
In the case that you have designed your own Ethernet MAC or RGMII interface, you will have to make sure that it is designed to output an RGMII TX clock that is in phase with the RGMII TX data. As RGMII is a DDR interface, you will probably be using ODDRs – make sure that they are all driven by the same clock. If your custom IP is producing a clock that is in phase with the data on the RGMII TX interface, and your ports are still not working, your problem is not likely to be due to the PHY’s delay setting.
GMII-to-RGMII IP Defaults
The GMII-to-RGMII IP core is typically used on the Zynq SoC and when you want to connect the PHYs to the MACs that are embedded in the PS (aka. Processor System). If you are not using the GMII-to-RGMII IP core, then this section does not apply to you.
The GMII-to-RGMII IP has a register that must be configured with the correct link speed, once a link is established by the PHY. The register is at address 0x10 and it is configured for operation at 10Mbps by default. Setting the link speed is done by writing the link speed value to this register at PHY address 8 on the MDIO bus. For more detailed information on this, please refer to the GMII-to-RGMII Product guide.
Without a processor, you likely are not able to communicate with the PHY (over the MDIO bus) to determine what the established link speed was. So in this case, your choice is to either operate only at 10Mbps, or to change your Vivado project such that the 2.5MHz clock is actually driven with 25MHz for operation at 100Mbps, or with 125MHz for operation at 1Gbps.