Compact FPGA Ring Oscillator Physical Unclonable Functions Circuits Based on Intertwined Programmable Delay Paths

— The Physical Unclonable Functions (PUFs) provide a strong secure root source for identification and authentication applications. It is especially valuable for FPGA based systems, as FPGA designs are vulnerable to IP thefts and cloning. Ideally, the randomness of PUFs should come from the random variation in manufacture process. It should be free of deterministic variation coming from systematic bias among all chips of the same model. Correspondingly, one of the major challenges for FPGA based PUFs is the difficulty of avoiding systematic bias between the nominally matched delays in the competing paths. In this paper, a deep investigation into the LUT structure on Xilinx FPGA is conducted. Based the investigation, a compact PUFs circuits based on programmable Look-Up-Tables (LUTs) paths is reported. The proposed intertwined structure and the 2-phase operation mitigate the systematic bias in Xilinx FPGA LUT. The proposed PUFs circuits are based on random variations between the same delay paths implemented in the adjacent LUT cells, thus showing very strong uniformity and uniqueness.


I. INTRODUCTION
ne of the key requirements for securing communications through public open networks is the ability to authenticate the recipients. In order to block malicious network elements, a network node must validate the identity of the recipient. One of such authentication methods is to use unique hardwaredependent keys provided by physical unclonable functions (PUF) [1,2]. The basic idea is to exploit non-reproducible manufacturing variations to provide a device-specific query. Such manufacturing variations are effectively impossible to be predicted or replicated. A PUF that can be implemented using general-purpose, re-configurable hardware is extremely attractive.
The most fundamental challenge for all PUFs is that it must exhibit extreme sensitivity to manufacturing variations, yet it must be deterministic to provide a consistent query response. Therefore, the ideal PUF structures should be free of systematic bias so that it is able to maximize the entropy due to manufacturing variations among different chips. In the meantime, the desired consistency in the outcomes under The authors are with Electrical Engineering Department, University of Wisconsin Milwaukee, Milwaukee, WI 53211 USA (e-mail: hu3@uwm.edu).
various operation conditions mandates that the ideal PUF design should have a mechanism to identify challenge-response pairs which have sufficient margins to withstand changes due to long-term device aging and temporal random noise in measurement system.
G. E. Suh and S. Devadas proposed RO PUF in [10]. In general, RO PUF is easier to be implemented on FPGA. However, FPGA RO PUF architectures often occupy large fabric resources. In order to improve FPGA RO PUFs' area efficiency, a new RO PUF architecture based on programmable LUT delays were proposed [2]. The basic idea is to explore the delay differences among variable delay paths inside LUT cells. As the delay paths inside LUT cells can be programmed by LUT inputs, the delay of single LUT may have many variations depending on the LUT inputs. For a 4 input LUT cell, there will be 16 programmable delay paths available. As more LUTs are cascaded, the number of unique programmable delay paths grows exponentially. For example, when two 4 input LUTs are cascaded, the total number of programmable delay paths grows to 256. The difference between two programmable delay paths can be used to form RO PUF. It was referred as virtual RO PUF in [18].
Practically, there are two different implementations to form FPGA RO PUFs based on programmable LUT delay paths. The first one is to extract random delay difference between two identical programmable delay paths from two different LUT cells. [1] [10]. This approach requires all the interconnects between adjacent LUTs are better matched than the programmable delay paths inside LUTs. Such requirements set the limit of how many LUTs can be cascaded. So far, there has not been published work with more than 4 LUTs cascaded. Therefore, the potential benefit of programmable delay paths based FPGA RO PUF architecture hasn't been fully revealed. In order to overcome this limit, 2-pass programmable delay paths based FPGA RO PUF architecture was proposed [18]. The idea is to run FPGA RO PUF twice with different programmable delay paths inside LUT with the same interconnects between LUTs. In this way, the impact from interconnects between LUTs are self-eliminated. Unfortunately, there are systematic bias among programmable LUT delay paths. Such systematic bias was experimentally demonstrated in Intel FPGAs [4]. Such systematic bias among programmable delay paths inside LUTs would lead to vulnerability due to information leaks [18].
In this paper, systematic bias among programmable delay paths in Xilinx LUTs were investigated and demonstrated first. The rest of this paper is constructed as follows. Section II gives background information on programmable LUT delay RO PUF architecture as well as the proposed RO PUF architecture. Section III focuses on the experimental investigation and demonstration on systematic bias among programmable LUT delay paths in LUT6 of Xilinx 7 series FPGA. Section IV presents the implementation of the proposed structure on Xilinx 7 series FPGA. Section V shows the experimental results along with comprehensive analysis on the measurement data. Finally, it is concluded in Section VI.

II. PROPOSED ENTROPY SOURCE AND HARVEST MECHANISM
A. Programmable LUT delay-based FPGA RO PUF Traditional RO PUFs are based on the "symmetrical" paths formed by identically designed inverters and interconnect wires. In FPGA implementations, such "symmetrical" paths are instantiated by design software, which are often opaque to circuit designers. In addition, FPGA chip designers and manufactures usually do not focus on the matching among the symmetrical interconnect wires. Therefore, those "symmetrical" paths often carry systematic bias, i.e., certain paths are faster than others. Such systematic bias is due to (1) predictable wire delay differences; and (2) systematic differences among driving transistors layouts. The systematic bias reduces the randomness or entropy of traditional RO based PUFs designs.
FPGAs' LUT cell can be programmed into many inverters and associated delay paths. Such configuration can be achieved by using input bits. As shown in Fig. 1, the 4-input LUT is programmed into 8 inverters/delay paths configurable by using input pins B, C and D, while pin A is used as the input pin. Such abundant circuit resources inside LUT cells have been explored to design PUFs with significantly improved area efficiency [2][15] [18].

B. 2-pass FPGA RO PUF to eliminate impacts from the interconnect between two adjacent LUTs
Traditionally, RO based PUF compares results from two separated ROs running simultaneously. The delay difference between two ROs are due to mis-match between interconnects as well as the delay path inside LUTs. The mis-match between interconnects are in general much greater than the mis-matches between programmable delay paths inside LUTs. In order to eliminate the impact from interconnect mismatch between two ROs, a 2-pass RO RUF architecture was proposed but not implemented in [18]. Fig. 2 shows the concept of the 2-pass FPGA RO PUF architecture. The Ring Oscillator portion consists of even number of inverter pairs and one NAND gate. Each inverter pair (red and blue) is programmed inside one LUT cell. It should be noted that only two inverters are shown inside LUT in Fig. 2. There should be more inverters implemented in each LUT cell. For example, 8 inverters can be formed in a single 4 input LUT cell, as shown in Fig. 1. The NAND gate is used to control the start and stop of the oscillation. The output of NAND gate is fed into a counter and data processing unit.
The operation of the proposed RO based PUF has two passes for one bit outcome generation. Two challenge bit strings are used for the two passes. The two challenge bit strings are substrings of the challenge that generates one bit outcome. In the 1 st pass, the red inverters are selected by the 1 st challenge bit string to form the ring oscillator. The control signal let the RO run for a pre-determined amount of time or certain number of clock cycles. The counter records how many oscillation cycles happened during that period. Then the PUF is switched to the 2 nd pass. In the 2 nd pass, the blue inverters are selected by the 2 nd challenge bit string to form another ring oscillator. The ring oscillator runs for the same amount of time as in the 1 st pass. The number of the oscillation cycles recorded by the counter is then compared against its counterpart from the 1 st pass. Depending on which pass has higher number of oscillation cycles, the result is one bit of the PUF outcome. This design requires less hardware resources than the traditional design. However, the potential system clock timing variation between 2 passes becomes a new concern because system clock is not steady all the time. It may impact the reliability of the PUF when two passes have two different running times. Besides, voltages and temperatures are two main factors impacting the frequency of RO. To mitigate variation caused by these factors, a reference RO is introduced. To distinguish the reference RO from PUF RO, the PUF RO is referred as target RO in the remaining of this paper. During operation, the reference RO runs simultaneously with the target ROs, so that the timing variation, voltage, and temperature variation are common between the two ROs. The counter reading from the targeted ROs will be calibrated by using the counter reading from the reference ROs. The impacts from system clock, voltage and temperature variations are removed in the calibrated targeted RO oscillation counts.

C. Systematic bias among programmable LUT delay paths
While the programmable LUT delay paths offered opportunity to greatly enhance the FPGA RO PUF area efficiency, it has been demonstrated that there is systematical bias among programmable LUT delay paths in Intel FPGAs. [18] So far, there hasn't been published investigation about systematic bias among programmable delay paths inside Xilinx FPGAs. The LUT structure in Intel FPGA and Xilinx FPGA are quite different. [16] [19] In section III, experimental investigation will demonstrate that systemic bias similar to the one in Intel FPGA exists in Xilinx FPGA as well. Furthermore, some systemic bias pattern never reported before is found in Xilinx FPGA.
With such systematic bias confirmed in both Intel and Xilinx FPGAs, the PUF delays corresponding to certain challenges can be ranked and/or guessed. Consequently, only sub-set of the challenges can be used to maintain secure PUF operation. Such limitation prevented the first attempt on 2-pass FPGA RO PUF architecture from being implemented [18]. To overcome such challenge, an improved FPGA RO PUF design will be presented in Section II.D.

D. The proposed FPGA RO PUF architecture to eliminate the systematic biases among programmable LUT delay paths.
After eliminating the systematic bias due to imperfectly matched metal interconnects, the systematic bias due to layout difference among inverters within each LUT emerges as the major source of systematic bias. In order to overcome this bias, the proposed design replaces the single inverter stage with a pair of intertwined inverters in two adjacent LUTs.   3 shows the change from the single LUT based stage (left) to the stage formed by two adjacent LUTs. In the single LUT stage design, the challenge bit strings are used to select which inverter is used in one of the two competing passes. In an instance, the top inverter is selected in the 1 st pass and the bottom one selected in the 2 nd pass. If the top inverter is stronger than the bottom one due to the layout difference, the systematic bias will be the dominant factor determining the output. In the two-LUT stage design (right), the challenge bit string is also used to select which path is used in the RO. Since both paths include both top and bottom inverters in LUT, any layout difference induced bias between the top and the bottom inverters are cancelled. Therefore, the systematic bias due to inverter layout differences within the LUT cells is cancelled in the proposed design. Section IV shows how intertwined LUTs eliminate systematic bias. According to the Xilinx 7 series FPGA datasheet, each LUT6 consists of two physical LUT5 [16]. This LUT6 structure is shown in Fig. 4. Input I6 of the LUT6 controls which LUT5 in effect renders its output as the output of LUT6. As Xilinx datasheet does not provide any details in LUT6, the knowledge about LUT6 layout is absent. Thus, the biases between the two physical LUT5 are investigated experimentally.

III. EXPERIMENTAL INVESTIGATION ON THE SYSTEMATIC BIAS ON PROGRAMMABLE LUT DELAYS IN XILINX FPGAS
Firstly, the corresponding programmable delay paths inside two LUT5 in single LUT6 are compared. ROs oscillation counts are used to measure the delay of the programmable LUT delay paths. As shown in Fig. 4, each RO consists of an AND gate and an inverter shown in Fig. 3(a). The inverter is implemented with the LUT delay path programed by 4 input challenge bits. One of the 16 possible LUT delay paths is chosen by 5 challenge bits. The last challenge bit is used to determine which LUT5 is used in the specific run. Fig. 5(a) shows how the 32 unique challenges map to the number of oscillation cycles on a RO. It should be noted that the blue clusters and the red clusters are the results from two LUT5 respectively. It is clearly shown in Fig. 5(a) that corresponding programmable delay paths in two LUT5 have systematic bias. This systematic is similar to the one in Cyclone IV, which has big bias between the two LUT3 in a LUT4. [18][19] While Fig.  5(a) shows the results from one RO, Fig. 5(b) shows the results from 32 ROs located at different part of the FPGA chip. It is clearly shown that systematic bias between corresponding programmable LUT delay paths located in two LUT5 within the same LUT6 exists in Xilinx 7 series FPGAs. Using the intertwined inverter pair is expected to neutralize this systematic bias.  After verifying the existence of the bias between the two LUT5 within the same LUT6, the next question is whether there is systematic bias among the programmable delay paths within each LUT5. If not, the PUF can be designed based on two competing programmable paths within the LUT5 cell. Otherwise, a new design is needed to mitigate the systematic bias among all programmable delay paths within the LUT. If the systematic bias within each LUT5 block do exist, the differences are expected to be small and hard to detect. The delays of the programmable delay paths are still measured by using ROs. In order to enhance the signal to noise ratio in the measurements, ROs consist of an AND gate and 8 LUTs cascaded together. Each of the 8 LUTs is programmed by the same 5 challenge bits so that the identical delay paths are chosen for each LUT. Fig. 6 shows the oscillation counts corresponding to different challenge bit strings. While the biases between the top and bottom LUT5 are still noticeable, the pattern within the group of blue clusters and the group of red clusters is clearly visible. It is obvious that there's pattern for every 4 challenge bit string clusters. It appears that the last 2 bits of challenge determine the pattern within the cluster of 4. Although there's some variation in each cluster, the intrinsic systematic bias is not negligible. To mitigate the systematic bias in the LUT5, a 2-phase operation is proposed in Section IV. Building RO PUF without eliminating the bias would result in the loss in uniqueness. The systematic biases become the major contributor to determine the PUF output bit. All RO PUF at different part of FPGA chip tend to give the same PUF bit responding to the same challenge. To understand how that happens, investigation was done on a Xilinx FPGA with ROs configured in the way mentioned in [18]. 8 LUT stages plus an AND gate is used in each RO, and each PUF takes 5 challenge bits as configurable bits. Therefore, 40 challenge bits configure the RO. Two passes compete to produce 1-bit outcome as PUF bit. The competing pair is chosen randomly. In the experiments, 4000 PUF bits were collected from each of 32 ROs. [18] discussed how the competing pairs should be selected to achieve best results. Some portion of challenges are discarded by that approach. In this investigation experiment, all challenges are kept for analysis. All ROs are fed with same challenges. PUF bits collected from RO i with challenge l is denoted as Ri. To challenge l, the mean of PUF bits from all 32 ROs is, k is the number of RO (k=32). 4000 samples of Rl are analyzed. As shown in Fig. 7, there is very large probability that all the 32 ROs give the same PUF bits, i.e., either output 1 or 0 from all 32 ROs located at different part of the chip. Ideally, to the same challenge, different ROs are expected to give different PUF bits. In that case, the peak of the distribution of should lie around 0.5. An ideal curve of the distribution from 32 different ROs is plotted in Fig. 7. The ideal distribution is based on the simulated data from the random number generator. Unlike the ideal distribution, the probability of having evenly split between 1 and 0 among measurement results from 32 ROs is around 2%. In comparison, 8.62% and 7.84% challenges would yield identical PUF output bits in all 32 ROs. And the probability that over 90% ROs (that is at least 28 out of 32ROs) have the same PUF bits to the same challenges is 20.67% and 19.88% for 0's and 1's respectively. However, as shown by the ideal distribution, in truly random PUFs, the probability that 90% of ROs give the same PUF output bits should be very small. This result effectively demonstrated that these RO are not unique. Systematic biases are common for all LUTs and the amount of the bias are quite significant. Such systematic bias shadows the disparity due to manufacturing randomness. For comparison, in section V, the result from proposed PUF will be presented. Besides, the uniqueness of such investigation PUF is presented in Table I.  The proposed architecture was implemented on Xilinx Artix-7 FPGA, as shown in Fig. 8. Zynq core is used to communicate with C code in PC. PUF is implemented solely in PL (programmable fabric). BRAM stores and sends challenge bit strings to ROs. The operation starts with Zynq sending the challenge bit strings to BRAM. BRAM then passes challenge bit string to target RO. The ROs are turned on for a predetermined period which is measured by the number of system clock cycles. The counters count the number of oscillation cycles of both target RO and reference RO. BRAMs capture the readings from two counters and Zynq takes over all the readings and data post-processing which can also be done off FPGA chips.  Fig. 9 shows the structure of ROs and the RO stages formed by intertwined LUT cells. As shown in Fig. 9(a), two LUT6's operating as inverters are included in one stage. Input I1 is used as the oscillation signal inputs fed from previous LUT. Inputs I2~I5 are used for 4 challenge bits to choose one out of 16 programmable delay paths inside LUT. Input I6 of LUT6_a takes the switch-bit while I6 of LUT6_b takes the inversion of switch-bit. The switch-bit determines which of two LUT5 is used in the programmable delay paths. The output of LUT6_b acts as output signal of intertwined LUT stage and is connected to the input of the next intertwined LUT stage. Fig. 9(b) presents the structure of RO. In addition to the even number stages of inverters, a NAND gate implemented in LUT5 is included. One of its inputs to NAND is connected to a control signal from Zynq to start and stop the oscillation in RO. The other input to NAND is connected to the output of the last inverter stage to complete a loop. The output of the last inverter stage is also connected to a buffer, whose output goes to counter.

A. Structures and operation
As mentioned in Section II. B, the proposed RO PUF architecture has 2-pass for each operation on any given challenge. The 2-phase operation mentioned in Section III is conducted for each pass. The dedicated switch-bit is set 0 and 1 for two phases, respectively. The pattern in each LUT5 could mitigate each other. Before each pass, challenge bit string is sent from BRAM to ROs. Control circuits set switch-bit to 0 for the 1 st phase. At this point, RO paths inside LUTs are determined. Then, control signal on NAND gate is set to 1 to start the oscillations. Once the clock counter reaches the pre-set target value, the counters stop counting RO oscillations and then control circuits stop the oscillation. At this time, the 1 st phase of the 1 st pass is done. The numbers of oscillations cycles of target and reference RO are read from counters. While the challenge bit string being the same, the same operations repeat for the 2 nd phase after the switch-bit is set to 1. After two phases are done, the calibration for the 1 st pass corresponding to the 1 st challenge bit string chl1 is calculated as In (2), Ntarget(chl1,0) is the number of oscillation cycles happened in target RO that is configured with challenge bit string chl1 and switch-bit 0. Nref is the number of oscillations cycles recorded from the reference RO running simultaneously with the target RO. Nref and Nref' reflect the difference caused by the system clock, voltage and temperature variation between the time when Ntarget(chl1,0) and Ntarget(chl2,1) are measured.
At this point, the calibrated number of oscillations corresponding to challenge bit string chl1 is recorded. Above mentioned process is repeated with another challenge bit string chl2 for the 2 nd pass. The second calibration number is acquired as calibrate(chl2).
Finally, calibrate(chl1) and calibrate(chl2) are compared to generate diff(chl), which is the final calibrated result for challenge chl. This determines 1-bit PUF response bit . They are described as, In the rest of paper, diff(chl) denotes the calibration giving 1bit PUF response bit with challenge chl, and chl is (chl1, chl2).
denotes the one giving multiple PUF response bits with multiple challenges. r denotes the 1-bit PUF response bit based on diff(chl) and R for multiple PUF response bits based on DIFF.

A. Systematic bias elimination
The huge bias between the two LUT5 in one LUT6 is corrected by the intertwined structure. Experiments similar to the one in Fig. 6 is done with the intertwined structure. Fig.  10(a) shows the boxplots for the 32 paths configured by 4-bit challenge bit string chl1 and switch-bit. In Fig. 10(a), contrast to Fig. 6, there is no big gap between Ntarget(chl1,0) and Ntarget(chl1,1). Implementation with mixture of the top and bottom LUT5s neutralize the biases between the two LUT5. Furthermore, the pattern of Ntarget(chl1,0) and Ntarget(chl1,1)are about the same. When chl1 keeping the same, the position of the quartiles of Ntarget(chl1,0) and Ntarget(chl1,1) are at the similar level. So, the bias in LUT5 can be further eliminated by taking their difference, which is done by (2). In Fig. 10(b), it can be found that the distribution of quartiles for Ntarget(chl1,0)-Ntarget(chl1,1) does not have any pattern. The biases in the internal structure of LUT5 are canceled. All the the quartiles' mean are around 0. In (2), Nref is involved in the calculation of the calibration of one pass, calibrate(chl1). Nref just helps to improve the stability to the timing difference. It has very minor effect on the distribution. calibrate(chl1) has very similar distribution as Ntarget(chl1,0)-Ntarget(chl1,1). Therefore, with the help of 2-phase operation, the two competing passes' calibration are free of any systematic biases.   Fig. 11. The result is compared with the ideal distribution, as well as the investigation experiment shown in Fig. 7. Here, the experimental data's distribution is very close to the ideal one. It indicates good randomness, which provides strong resilience to attacker. The quality of PUF bits is further proved by the strong uniqueness and uniformity in Table I.

B. Mitigation against operation condition variation and noise reduction
The purpose of using reference RO is to mitigate the variation caused by system clock and environmental variables, i.e., voltage, temperature etc. Fig. 12 s how the effectiveness of the reference RO to reduce the impact from operation condition variations. Here, for simplicity, Ntarget(chl1,0) and Ntarget(chl1,1) are referred as Ntarget(chl1,~). 20480 measurements samples of Ntarget(chl1,~) from a target RO and 20480 sample measurements of Nref from a reference RO are taken consecutively with a fixed challenge bit string given to target RO. It is clear that Ntarget(chl1,~) fluctuates from time to time. That indicates Ntarget(chl1,~) is sensitive to operation condition variations. Nref has the similar fluctuation patterns. Therefore, Nref can be used to calibrate Ntarget(chl1,~) against the operation condition variations. As shown in Fig. 12(b), compared to Ntarget(chl1,~), the trend of Ntarget(chl1,~)-Nref is flattened.
While Ntarget(chl1,0) and Ntarget(chl1,1) have similar trend, subtracting them does not necessarily cancel the variation within the trend. Ntarget(chl1,0) and Ntarget(chl1,1) are taken at different time. If there is large time interval between their measurements, system clock and environmental variables maybe be quite different. Only the reference RO that runs at the same time with target RO suffer common interference. In Fig. 12, besides the changing trend, it is noticeable that both Ntarget(chl1,~) and Nref have some 'thickness'. This comes from the noise in FPGA. Fig. 12(a) and (b) are in the same scale. It can be observed that the noise in Ntarget(chl1,~)-Nref is smaller than the one in Ntarget(chl1,~) and Nref. This noise reduction can also be seen in the PUF output diff(chl). Data processing is done for the scenario without reference RO. In this scenario, as reference RO is not involved in the process to get PUF response bit, calibrate(chl1) is defined as Ntarget(chl1,0)-Ntarget(chl1,1). This scenario is compared with the one using reference RO in Fig. 13. Here, σspecific is the deviation in diff(chl). If a diff(chl) is near decision boundary, the smaller the σspecific is, the less likely that it yields opposite 1-bit PUF bit. It is found that reference RO helps to push the deviation down for RO64 and RO128. However, when width of RO is small, 32-bit for instance, reference RO has negative impact on . In this case, reference RO is less correlated to target RO. They both suffer from large random variations. Rather than reducing variations, taking the difference between Ntarget and Nref adds these random variations up. Increasing the number of LUTs reduces the random variations. So, the variations in target RO and reference RO are more in common so that it can be cancelled.  13. σspecific for the scenario w/wo reference RO. The smaller the σ is, the more stable the response is.

C. PUF Performance
To evaluate the performance of proposed PUF, the metrics described in [7] are used. One physical RO in this work acts as a PUF. Experiments are carried out on two Xilinx Artix-7 FPGAs and 32 PUFs on each FPGA (m=64). Each PUF is fed with 2000 challenges to get a 2000 PUF bits (n=2000). Each response has 20 samples (t=19) on each PUF.

Uniqueness
In the tests, uniqueness indicates how different a PUF is from another PUF. Ri is n-bit PUF bits of PUF i. Hamming distance (HD) evaluates the uniqueness between the PUF bits of PUF i and j. HD is calculated for all possible PUFs pairs selected from k PUFs. Ideally, uniqueness would be 50.

Uniformity
For any PUF, it should have the same probability of giving a PUF bit as 0 or 1. Otherwise, attacker is more likely to be successful by predicting the response with larger probability. In the PUF i, the percentage of 1's out of n PUF bits defines the uniformity, whose ideal value is 50%,

Reliability
Reliability is important as PUF should produce consistent responses whenever the ROs are given the same challenge. It indicates how likely a PUF could reproduce the same PUF bit to the same challenge. On PUF i, with the challenge same as getting Ri, Ri,t ' are recorded. HD is calculated regarding Ri and Ri,t ' . m is the number of samples of , ′ . The ideal value for reliability is 100%.
To cover a wide range of compact FPGA PUF design, ROs are implemented with 32, 48, 64, and 128-bit challenge, and each width is experimented with 32 samples of RO (k=32) on each of two instances of board. Each measurement is collected after RO running for 15.729ms. Experiments are conducted at room temperature and normal voltage.
For comparison purpose, the LUT programmable delay path based ROs are implemented both with and without intertwined stages. The one without intertwined designs are referred as investigation PUF. The results from those ROs have very poor uniqueness. As stated in previous sections, the systematic biases are the major contributor to such disparity.
When compared with other RO PUF designs, it is found that the proposed RO PUF is in advantage of previous works in terms of both uniformity and uniqueness. The careful removal of the systematic bias helps to make significant improvement in these two metrices. Uniformity and uniqueness in the proposed work are very close to ideal value. With less deterministic biases, the variation in PUFs is largely based the random variation during manufacturing process. Besides, this work's reliability is very close to the highest level.

VI. CONCLUSION
In this paper, a configurable RO based PUF design is proposed. It eliminates the systematic bias from imperfectly matched metal interconnects and LUTs cell inter structure. The systematic bias between two LUT5 in one LUT6 in Xilinx FPGA is overcome by pairing two LUTs as intertwined LUT pairs in a stage. The systematic in LUT5 is in pattern. The proposed 2-phase operation eliminate that pattern, making the PUF solely based on random variation during manufacturing process. The timing variation between 2-pass is mitigated using reference RO, which reflects the change of system clock, voltage, etc. Performance is carefully considered, and the removal of systematic bias has significant effect on uniformity and uniqueness. Both uniformity and uniqueness are very close to ideal value. These tests show that our RO would be supportive for a strong RO based PUF design.