1. Introduction
The range of applications for RFID (Radio Frequency Identification) systems is vast, spanning areas such as logistics, healthcare, access control, ubiquitous computing and supply chain management, as well as applications in the context of IoT (Internet of Things) systems [
1]. Among the different types of RFID tags, passive tags are the simplest, cheapest and most ubiquitous [
2]. Passive RFID tags operate without an internal board power source, relying on energy received from the RFID reader for its operation. RFID systems are even being proposed for applications commonly covered by conventional battery-powered wireless sensor network (WSN) devices, through the emerging field of RFID sensors [
3], which raise even more challenges for the severely energy-constrained passive RFID tags.
Driven by its increasing demand, the use of ultra-low-power RFID tags in commercial products has brought risks related to information security, industrial espionage and individual privacy. Inventory information or personal identification without cryptography can be easily monitored without a trace of who did it. Therefore, most digital ID and tracking applications must have security and privacy addressed in their project architectures, just like credit card applications have.
To meet the growing demand for product tracking via RFID tags, there is a trend toward lowering the cost and power consumption of these devices. Consequently, their computational capabilities tend to be very low, which poses challenges in the implementation of encryption schemes for these devices.
The design of low-power devices should take into account three main fundamental aspects: chip area, power consumption and latency (clock cycles). This work focuses primarily on the issue of power consumption, which may also contribute to improvements regarding other relevant aspects, such as chip area.
The power provided by the RFID reader over the air interface decreases linearly with the operating distance to UHF (Ultra High Frequency) tags. In order to allow cryptographic operations in the whole operating range of a tag, which, in the case of UHF tags, typically ranges up to seven meters, a limit on the power budget of approximately
should not be exceeded [
4].
In this paper, we present a comparative analysis of the power consumption efficiency of AES (Advanced Encryption Standard) and Salsa20 ASIC implementations (both designed by us) optimized for use in passive RFID tags in order to determine which algorithm is more suitable to operate in low-power devices. In this sense, the main contributions of this work are the design, implementation and evaluation of these algorithms with the goal to provide security to low-power devices for digital identification applications.
The remainder of this paper is organized as follows:
Section 2 presents related work while
Section 3 and
Section 4 presents the algorithm descriptions and implementations respectively, and finally
Section 5 presents the results and discussions.
2. Background and Related Work
2.1. Security Level
A deep analysis of the security level of AES and Salsa20 ciphers is out of the scope of this paper, but these two ciphers appear to have similar security levels, according to related cryptanalytic studies presented below.
AES, also known by the name Rijndael, was announced as a standard by the U.S. National Institute of Standards and Technology (NIST) in 2001 [
5]. Cryptanalytic papers in the next years culminated in attacks taking [
6,
7]:
Salsa20 was published in 2005 [
8]. Refereed cryptanalytic papers by Fischer
et al. [
9] and Tsunoo
et al. [
10] have culminated in attacks taking:
These results indicate that AES and Salsa20 present similar security performance for the same number of rounds.
2.2. AES and Salsa20 Implementations
Over the years, RFID tags have been designed with the goal of reducing their power consumption in order to meet the demand for passive chip applications and lower cost.
Experimental results from L. Fu
et al. [
11] show that a RFID dedicated AES module can achieve low power operation, down to
@
and latency of 204 cycles.
The low cost demanded for RFID tags forces them to be very resource-limited. Typically, they can only store a few hundred bits, have 5-10k logic gates and offer a maximum communication range of a few meters. Within this gate counting limitation, only between 250 and 3000 gates can be devoted to security functions [
12].
Several papers have presented low-power implementations of the AES suitable for RFID tags applications in terms of power consumption and die size [
4,
13,
14], where the best results are about
on
at 100
[
15].
There are several implementations of Salsa20 [
16] for FPGA (Field Programmable Gate Array) and ASIC (Application-Specific Integrated Circuit) simulation, all of them optimized for speed. However, these implementations are not concerned about low-power constraints.
Some software-based papers combine both ciphers by using Salsa20 for encryption and AES for authentication [
17], but we still lack of hardware-based papers as noted in a recent review that excludes a salsa20 implementation on chip for proper comparison [
18].
3. Algorithm Descriptions
3.1. The AES Algorithm
An official description of the AES is detailed in the NIST FIPS (Federal Information Processing Standards) PUB 197 [
5]. For the sake of clarity, a brief outline of the AES’s structure is explained in this section. The AES algorithm is a block cipher that was published in the FIPS 197, in 2001. It was adopted by the U.S. government when the National Security Agency (NSA) approved AES as a cipher for top-secret information in 2002.
The AES is capable of using cryptographic keys of 128, 192, and 256 bits to encrypt or decrypt data in blocks of 128 bits. The data to be processed is usually expressed as an array of bytes organized as a 4 by 4 matrix and called ’State’.
The design principle is based on a Substitution permutation network and it is specified to convert an input block into a final output block by a number of repetitive transformation rounds [
5]. Each round consists of up to four processing steps, which are performed at the byte or bit level of the State. The transformations that describe a round of AES and the respective processing steps are:
AddroundKey transformation: this is simply the XOR between each bit of the State to each bit of the round key. This is the operation that depends on the cryptography key.
SubByte transformation: this is a non-linear byte substitution. It has two steps, of which the first one is a multiplicative inverse and the other is an affine transformation.
ShiftRow transformation: this is a byte-wise operation. The first row of the State is not shifted, but the last three rows of the State are rotated over 1, 2 and 3 bytes, respectively. This operation adds linear diffusion.
MixColumn transformation adds linear diffusion into the cryptography. Each column of the State is combined using an invertible linear transformation. Each column is treated as a polynomial over GF (Galois Field) and it is then multiplied by a fixed polynomial modulo , given by:
During the InvMixColumn operation, each column is treated as a polynomial over GF and then multiplied by a fixed polynomial module , given by:
3.2. The Salsa20 Algorithm
Salsa20 is a stream cipher that works in counter mode. It generates a sequence of keystream blocks Z, which are then XORed with the input message (plaintext) to produce the encrypted message (ciphertext). The internal keystream generation function of Salsa20 takes as input a 256-bit secret key
and a 64 -bit nonce
, i.e., a unique message number, to produce a sequence of 512-bit keystream blocks. The inputs are configured as a 4 by 4 matrix of 32-bit words:
where the 64 -bit counter
corresponds to the message block index and the
are predefined constants. The keystream block
is then defined as:
The double-round function
consists of the double computation of four QUARTERROUND functions
over the rotated columns and rows of X.
is divided into the column step, which applies four
functions on the columns of
X, and the row step, for the rows of
X:
The transformation updates four 32-bit words of the matrix . It sequentially computes per line over the tuple :
Considering equations
double-rounds are executed over the input matrix X. Finally, the updated matrix
is added to the original input matrix. Salsa20 has been presented as a
rounds stream cipher [
16].
4. Algorithm Implementations
4.1. AES Implementation
Since the AES algorithm is iterative, a minimum set of processing blocks is used and a simple finite state machine controls the many rounds that repetitively reuse these processing blocks.
The current implementation has three main processing blocks:KeySchedule, MixColumn and SubByte, where the latter includes also the ShiftRow operation, with both areas being executed by the same processing block. The encryption and decryption steps of the simple finite state machine are described in
Figure 1.
In order to save any redundant processing during key expansion for decryption, the ten round keys are saved in registers before any data processing.
As you can see from the implementation flowchart, the first step during cryptography is to derive its ten round keys and to save each round key in a bank of registers. This approach provides a latency improvement of 135 cycles with the area addition of nine 128-bit registers.
Both SubByte and KeySchedule transformations use a S-box. Since the control unit does not request the SubByte and KeySchedule to operate at the same time, they can share the same S-box logic to minimize area. In this implementation, in order to speed up the S-box tasks, there are two identical instances that function in parallel, as shown in
Figure 2.
The first step for the S-box comprises finding the multiplicative inverse of a byte from the AES’s state. The second step S-box comprises an affine Transformation. The element of inversion is performed in by means of mathematical manipulation.
The MixColumn controller sends a 32-bit input to a multiplier block, namely, Word_MixColumn. Each input stream sent from the MixColumn controller is a column of the AES State. Thus, MixColumn operation is performed in four cycles (
Figure 3), since each column of the State is processed per cycle. The 32-bit column is processed by four multipliers block. We reused common constant multipliers in the data path between the MixColumn and InvMixColumn operations to reduce the hardware area.
4.2. Salsa20 Implementation
The Salsa20 implementation prioritizes a low-power approach over execution time. Each step of the QUARTERROUND function is executed in a clock cycle for power-saving purposes. In this case, the QUARTERROUND function is executed in four clock cycles. For timing purposes, the double-round function control state machine uses two QUARTERROUND modules at the same time.
The basic operation of Salsa20 is the QUARTERROUND function. It is executed 80 times in the Salsa20 algorithm, so it is the most obvious choice for optimizing in terms of power.
Figure 4 shows the Salsa20 encryption hardware implementation. It includes a 64-bit counter to generate the data input to the Salsa20 expansion module, as described in the Salsa20 specification [
8]. It also evaluates the XOR for the encrypted output. The key and data input are 128 bits, for better comparison of the Salsa20 implementation with the AES-128 implementation.
The ’Salsa20 expansion’ module is a simple wire concatenation in the input of the Salsa20 core module as shown in
Figure 5. The T0, T1, T2, T3 constants are described in the Salsa20 specification [
8].
The Salsa20 core module (
Figure 6) is composed of the Salsa20 DOUBLEROUND10 module with LITTLE_ENDIAN functions at the input and output. The LITTLE_ENDIAN function changes the endianness using a byte as the minimal block.
Figure 7 shows the Salsa20 DOUBLEROUND10 module implementation. It is composed of a control state machine and two QUARTERROUND modules. The double-round function is a column-round function followed by a row-round function.
The Salsa20 DOUBLEROUND10 control state machine (
Figure 8) controls the data flow to and from the QUARTERROUND modules. This control state machine executes two QUARTERROUND functions at the same time for each half-round of the double-round (the first half of column-round, the second half of column-round, the first half of row-round and the second half of row-round).
Figure 9 shows the Salsa20 QUARTERROUND, where four words (32 bits each) are evaluated one at a time. The QUARTERROUND is optimized for power: each word takes a clock cycle in the QUARTERROUND execution, so each QUARTERROUND execution takes 4 cycles to complete.
The Salsa20 QUARTERROUND control state machine (
Figure 10) controls the clock gating of the four-word evaluation sub-blocks.
5. Results and Discussion
5.1. AES Design
The toggle count of each processing block during the simulation of an AES decryption can be observed in
Figure 11. Since the technology node is
, dynamic power is the dominant factor in our power analysis. Based on the toggle counts of encryption and decryption simulations, one can conclude that the peak power consumption occurs during the MixColumn transformation. Therefore, the decision to add two S-boxes does not affect peak power. We have concluded that the peak power of this AES implementation with two S-boxes is not affected. Moreover, the S-box implementation uses very little area, and the addition of a second S-box does not represent a considerable cost to the overall system.
Table 1 shows a summary of the main simulation results generated from the toggle waveform (that represents the number of transitions in a circuit in a given period, which is a good approximation for the power). The AES design has an average power consumption of
with a clock of
. The encryption or decryption latency is 180 cycles and its critical path takes
(we basically achieved the same characteristics obtained by L. Fu
et al. [
11]). The reduced and balanced latency of both decryption and encryption is achieved at the cost of the nine 128-bit registers used by the KeySchedule block. These extra registers avoided redundant processing but had an impact on the overall area. This AES design has 4,303 cells and a total area of
.
5.2. Salsa20 Design
The toggle count of each processing block of the Salsa20 simulation can be observed in
Figure 12. As expected, the peaks of the toggle are concentrated in the QUARTERROUND function. Two QUARTERROUND blocks were used instead of only one to make the timing close to the AES implementation.
Table 2 shows the summary report generated by the simulation-based toggle waveform. The Salsa20 has an average power consumption of
with a clock of
and a
cell library. The encryption and decryption latency is 202 clock cycles and its critical path takes
.
5.3. Layout Comparison
The layout of both designs used the X-FAB and library. The AES and Salsa20 modules have the same utilization area of .
The AES layout, depicted in
Figure 13, includes the AES module and a testing control logic. The layout of the AES module is colored in red and it is
which is very close to the estimation from Table 1.
The Salsa20 layout (
Figure 14) includes the Salsa20 module and the same testing control logic. The layout of the Salsa20 module is colored in red and it is
which is also very close to the estimation from Table 2. The AES layout has two more filler pads than the Salsa20 layout because of its bigger area.
6. Conclusions
In this paper, low-power implementations of the AES and Salsa20 were proposed and their results were compared. In order to fairly compare the cost and power consumption of those two cryptographic algorithms without any trade-off compromise, the same synthesis and simulation parameters, such as clock, test vectors and tech library, were used on both of them. In addition, both have been designed to have similar latencies.
Our work shows that Salsa20 power consumption is considerably lower than the AES power consumption, suggesting the former is a better choice for low-power devices. Moreover, the area of Salsa20 implementation is also considerably lower than that of the AES one, presenting also a lower fabrication cost. Therefore, Salsa20 is a very attractive cryptographic algorithm for secure RFID applications.
Author Contributions
Mario Gazziro and João Carmo work on the digital implementation of Salsa and AES, respectively. Marco Cavallari and Oswaldo Hideo works on digital verification. José Afonso wrote the article.
Funding
This research received no external funding.
Data Availability Statement
The data presented in this study are available on request from the corresponding author
Acknowledgments
This work was supported by the Wernher von Braun Center for Advanced Research. The authors also want to thank Nelson Guimaraes, Paulo Matias, Henrique Okada, Alexander Sieh, Andre Costa, Dario Thober and Jecel Mattos.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Munoz-Ausecha, C., Ruiz-Rosero, J., & Ramirez-Gonzalez, G. (2021). RFID applications and security review. Computation, 9(6), 69. [CrossRef]
- Casella, G., Bigliardi, B., & Bottani, E. (2022). The evolution of RFID technology in the logistics field: a review. Procedia Computer Science, 200, 1582-1592. [CrossRef]
- Costa, F., Genovesi, S., Borgese, M., Michel, A., Dicandia, F. A., & Manara, G. (2021). A review of RFID sensors, the new frontier of internet of things. Sensors, 21(9), 3138. [CrossRef]
- Oren, Y., & Feldhofer, M. (2009). A low-resource public-key identification scheme for RFID tags and sensor nodes. In Proceedings of the second ACM conference on Wireless network security (WiSec ’09). Association for Computing Machinery, New York, NY, USA, 59–68.
- National Institute of Standards and Technology (NIST). ’FIPS-197: advanced encryption standard, November 2001.
- Piret, G., & Quisquater, JJ. (2003). A Differential Fault Attack Technique against SPN Structures, with Application to the AES and KHAZAD. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds) Cryptographic Hardware and Embedded Systems - CHES 2003. CHES 2003. Lecture Notes in Computer Science, vol 2779. Springer, Berlin, Heidelberg.
- Mangard, S. (2003). A Simple Power-Analysis (SPA) Attack on Implementations of the AES Key Expansion. In: Lee, P.J., Lim, C.H. (eds) Information Security and Cryptology — ICISC 2002. ICISC 2002. Lecture Notes in Computer Science, vol 2587. Springer, Berlin, Heidelberg.
- Bernstein, D.J. ‘The salsa20 family of stream ciphers’ eSTREAM, ECRYPT Stream Cipher Project, Report 2005/025, http://www.ecrypt.eu.org/stream, (2005).
- Fischer, S., Meier, W., Berbain, C., Biasse, JF., & Robshaw, M.J.B. (2006). Non-randomness in eSTREAM Candidates Salsa20 and TSC-4. In: Barua, R., Lange, T. (eds) Progress in Cryptology - INDOCRYPT 2006. INDOCRYPT 2006. Lecture Notes in Computer Science, vol 4329. Springer, Berlin, Heidelberg.
- Tsunoo, Y., Saito, T., Kubo, H., Suzaki, T., & Nakashima, H. (2007). Differential cryptanalysis of Salsa20/8. In: SASC2007. [S.l.: s.n.], pp. 10-22.
- Fu, L., Shen, X., Zhu, L., & Wang, J. (2014). A low-cost UHF RFID tag chip with AES cryptography engine. Security Comm. Networks, 7: 365-375. [CrossRef]
- Peris-Lopez, P., Hernandez-Castro, J.C., Estevez-Tapiador, J.M., Ribagorda, A. (2006). M2AP: A Minimalist Mutual-Authentication Protocol for Low-Cost RFID Tags. In: Ma, J., Jin, H., Yang, L.T., Tsai, J.JP. (eds) Ubiquitous Intelligence and Computing. UIC 2006. Lecture Notes in Computer Science, vol 4159. Springer, Berlin, Heidelberg.
- Satoh, A., Morioka, S., Takano, K., & Munetoh, S. (2001). A Compact Rijndael Hardware Architecture with S-Box Optimization. In: Boyd, C. (eds) Advances in Cryptology — ASIACRYPT 2001. ASIACRYPT 2001. Lecture Notes in Computer Science, vol 2248. Springer, Berlin, Heidelberg.
- Mangard, S., Aigner, M., & Dominikus, S. (2003). A highly regular and scalable AES hardware architecture. In IEEE Transactions on Computers, vol. 52, no. 4, pp. 483-491. [CrossRef]
- Feldhofer, M., Wolkerstorfer, J., & Rijmen, V. (2005). AES implementation on a grain of sand. IEE Proceedings Information Security, Vol. 152, pp. 13-20. [CrossRef]
- Henzen, L., Carbognani, F., Felber N. and Fichtner, W. (2008). VLSI hardware evaluation of the stream ciphers Salsa20 and ChaCha, and the compression function Rumba. In: 2nd International Conference on Signals, Circuits and Systems, Nabeul, Tunisia, 2008, pp. 1-5.
- Nikitha, G., Kathrine, G., Duthie, C., Ebenezer, V., & Silas, S. (2023). Hybrid Cryptographic Algorithm to Secure Internet of Things. In: 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, pp. 1556-1562.
- Bokhari, M., & Afzal, S. (2023). Performance of Software and Hardware Oriented Lightweight Stream Cipher in Constraint Environment: A Review. In: 10th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2023, pp. 1667-1672.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).