Physically Unclonable Functions with Confidential Computing for Enhanced Encryption of EHRs

Continual exploitation of Electronic Health Records (EHRs) has led to increasing amounts of ransomware and identity theft in recent years. Existing cryptosystems protecting these EHRs are weak due to their inherently transparent software that allows adversaries to extract encryption keys with relative ease. I designed a novel cryptosystem that employs Physically Unclonable Functions (PUFs) to securely encrypt user EHRs in a protected SGX enclave. The CPU-attached PUF provides a secret, device-unique value or a ‘digital fingerprint’ which is used to derive a symmetric key for subsequent AES-NI hardware encryption. Since the cryptographic operations, from key derivation to encryption, transpire in a confidential SGX enclave, the keys are always protected from OS-privileged attacks- a capability lacking in most existing systems. I used my system APIs to evaluate the performance of various hash and encryption schemes across multiple EHR block sizes. SHA512 and AES-NI-256-GCM were selected for cryptosystem implementation because they demonstrated high performance without compromising on security.


Introduction
Accompanying the explosion in popularity of electronic biometric tracking devices, from FitBits to Apple Watches, consumer Electronic Health Records (EHRs) are being harvested at unprecedented rates [1]. Upon being collected, these EHRs leave the user's possession and often fall into the wrong hands or are used for unauthorized purposes. In fact, this private data has quickly become the very essence of black market exchanges, being traded and exploited all without the user's consent or even their knowledge [2]. As the prevalence of smart technologies increases, so does the severity and frequency of these fraudulent exchanges. In the first half of 2019 alone, 4.1 billion records were estimated to have Physically Unclonable Functions with Confidential Computing for Enhanced Encryption of EHRs -2/11 been breached, indicating the presence of explicit gaps in existing cryptosystems tasked with protecting EHRs [3].
Many modern cryptosystems are highly reliant on the software for purposes like key derivation, encryption, and key protection, which is in itself sub-optimal. Adversaries are able to extract relevant information about cryptographic operations from the innately transparent software with relative ease (such as through software side-channel attacks) [4]. If the encryption key is ever betrayed to the adversary, they will have full, undisputed access to the EHRs, which makes it all the more essential that the key remains securely protected-preferably by the robust hardware. The focus of my research is to develop and test a novel hardware cryptosystem that effectively protects the keys, giving users complete control over their medical data records.

Encryption
Encryption is the process of encoding information such that only authorized parties possessing the encryption key can decrypt and read that data. There are two primary types of encryption: symmetric and asymmetric. In symmetric encryption, a single private key has the power to both encrypt and decrypt data [5]. The most prevalent symmetric encryption algorithm is Advanced Encryption Standard (AES), which has modes such as Cipher-Block Chaining (CBC) and Galois/Counter Mode (GCM) along with various key sizes. The other subset of encryption, asymmetric, involves the use of a public and private key pair [6]. The former can only be used to encrypt the data and is thus released to the public. The private key, however, stays only with authorized parties and is what is used to decrypt the data that was encrypted by the public key. RSA is considered an intractable asymmetric encryption algorithm that uses relatively prime numbers to generate this pair of encryption keys [7].
Regardless of which algorithm or subset of encryption is being implemented, what remains essential is always the protection of the encryption key. This involves everything from its generation by a key derivation function (KDF) to encryption to key disposal or storage. With small key sizes, adversaries can easily run exhaustive key searches-the brute force process of testing different key possibilities to find the correct one.
Since the key possibilities vs. key length graph demonstrates an exponential relationship, one of the best defenses against these attacks is to merely increase the key size [8]. For example, a 256-bit key has 2 256 key combinations, which is estimated to be the total number of atoms in our universe [9]. Hence, an adversary would have to run through magnitudes of combinations -a process that would elapse multiple centuries-before they found the right key. Other factors that contribute to the key strength include the salt(s) that went into making the key and their availability to external parties.

Hash
Hash functions are one-way functions that take an arbitrary n-bit input and always output a seemingly random sequence of bits of predetermined length. For example, the digest of the SHA512 algorithm will always consist of 512-bits, irrespective of the input size. Unlike encryption, the hash pre-image property prevents hash functions from traversing backward from the output digest to what was initially supplied in the input [10]. These properties are why many KDFs rely on internal hash-based mechanisms to generate their secure encryption keys [11].
Every secure hash function must also be collision- As modeled above, a collision transpires when two unique inputs produce the same hash digest. Since the message space is vastly larger than the tag space, there will invariably be collisions due to the pigeonhole principle [12]. A good hash function ensures that it minimizes these collisions and makes them non-generalizable such that an adversary cannot print or predict a collision at will.

PUFs
Due to inherently unpredictable silicon fluctuations during the manufacturing process, each Physically Unclonable Function is a random, unique, and immutable 'digital fingerprint.' The PUF can employ its underlying physical characteristics to generate a secret 256-bit value [13].
A PUF could be created, for example, if someone sprinkled reflective flakes on a melting gold brick and let the gold solidify. Shining a flashlight on this resulting contraption would result in a light pattern that has the same properties as a PUF: it cannot be feasibly replicated as it would be nearly impossible to get the exact positioning and angle of each flake in the gold bar, it is random, and it is immutable. I propose that the PUF be embedded on the CPU, where its 256-bit secret contributes towards generating a secure, hardware-confined AES key for subsequent EHR encryption.

Software Guard Extensions (SGX) is Intel's instruction set for implementing confidential computing on
Intel CPU [14]. SGX ensures that upon calibration, the BIOS will set aside a portion of the device's memory for trusted operations that are only accessible to the CPU.
The CPU performs access control and encryption on the secure computing enclave to prevent higher privileged software like the OS and BIOS from accessing the contents of this memory. These security measures make SGX a prime location for key storage and secret provisioning that doesn't involve third parties (unlike key escrowing) [15]. However, it is important to note that while it keeps adversaries out in many ways, SGX itself does not protect against side-channel attacks if the code executed within the enclave is not software side-channel resistant. Hence, it is typically advised to only execute small portions of trusted code in each enclave lest everything become compromised from an intrinsic code vulnerability.

System Architecture
My hardware encryption system employs PUFs to securely encrypt user EHRs, giving consumers complete control over their data. Upon receiving the user's EHRs from medical tracking devices, the Health application needs to secure these records by encrypting them.
Rather than participating in traditional software encryption methods that are inadequate for providing high security, I propose that the EHRs are sent to the hardware using my set of APIs. Upon arriving, the records undergo secure PUF-based encryption, and the resulting ciphertext is sent back to the application for local storage. Alternatively, if the application wants to send the EHRs to the cloud, the PUF will be used to negotiate an RSA key with the cloud. After authentication (via CAs), the PUF can securely send the EHR ciphertext to the cloud in this privacy-preserving manner. Regardless of the type of encryption being used, the entire life cycle and implementation of the key occurs not only within the hardware but in a secure SGX enclave, ensuring the key is always protected.

Threat Model
The trusted computing base for EHRs consists of Health app, OS, PUF, Software Guard Extensions (SGX), and CPU. For encryption keys, only the CPU, PUF, and SGX are trusted. This cryptosystem possesses valuable security properties lacking in many existing systems including its resilience against side-channels, local malware, and OS-privileged attacks.

Novel PUF Key Derivation
The first step in the key derivation process is to hash the PUF secret using the SHA-512 hash primitive. The resulting digest, along with the Health app's ID and Salt, is inputted to a KDF to generate a secure AES key. Intel AES-NI then uses the newly minted symmetric encryption key to securely encipher the consumer's EHRs. In addition to maintaining the confidentiality of the records, the cryptosystem can also perform integrity protection Physically Unclonable Functions with Confidential Computing for Enhanced Encryption of EHRs -5/11 via digital key signing. Although the application's seed components are essential to ensuring the uniqueness of each key (using one key for all encryption tasks makes it easier to break the encryption), the key strength is largely provided by the PUF. This is because, as mentioned in my trusted computing base, the Health application cannot be trusted with the key, so we cannot rely on its inputs as a source of the key's security. Since KDFs require the same inputs to produce the same key (which may be needed for purposes like decryption), each PUF must maintain its physical properties -and hence its secret-throughout its lifetime [16].
Because the PUF utilizes its hardware structure to generate the secret, an alteration to the PUF would re-sult in a significant change to the digital fingerprint it provides. Existing protocols for PUF error correction are able to significantly minimize this, resulting in negligible risk posed toward the PUF-based key derivation [17]. As discussed in the system architecture, even if the PUF breaks or the phone is lost (hence losing the encryption keys forever), there is a cloud-based system in place to ensure the records are still accessible by your new device.
Listed below is an overview of the encryption/decryption steps for the EHRs.  In the event that an adversary manages to corrupt the OS or implant malware on the device, they will find that they cannot breach the key from the software since the key's entire life-cycle is restricted to the CPU-protected SGX enclave.

Methods
Due to a commercial unavailability of CPU-embedded PUFs, the secret was replicated by generating a random 256-bit value. Using Oracle JRE and Eclipse IDE, I developed a prototype that followed the same structural format as the system architecture. I also created a set of Java APIs that enable software applications to interact with and use hardware PUFs for secure key derivation, encryption, and hash operations. These APIs are directly executed in a trusted enclave for enhanced security.
To determine which encryption algorithms (and modes if applicable) and hash functions were most computationally optimal for implementation in my design, I tested various cryptographic algorithms on my prototype using Java Cryptography Extensions (JCE) and OpenSSL.
Real-world EHR data sizes from 16 bytes to 8k bytes were inputted for each of these algorithms to test them under real-world conditions, and the throughput was calculated for each trial. In addition to evaluating their performance, the encryption and hash algorithms were also evaluated under a security rubric.

Data and Analysis
I conducted three different cryptography comparisons and performed two primary analyses (performance and security) on each comparison.

Software Encryption Algorithms
In this particular evaluation, I conducted performance and security tests on different software encryption algorithms. This largely included subtypes of the predominant AES scheme with varying modes and key sizes.
AES-GCM had much higher throughput than AES-CBC and ChaCha20-Poly1305 for all data input sizes.
Data shows that at the 8k Bytes EHR size, AES-GCM had a throughput about 15 times that of AES-CBC. In addition to its high level of performance, AES-GCM Physically Unclonable Functions with Confidential Computing for Enhanced Encryption of EHRs -7/11 offers a reliable mode of authenticated encryption using GHASH. For algorithms such as AES that are semantically secure, it is important to analyze their security against exhaustive key searches: standard dictionary attacks that attempt billions of different keys at a rapid pace to find the correct key. Strong algorithms naturally have a large keyspace, thus exponentially increasing the total possible key combinations. For example, 256-bit key lengths (such as AES-256) have so many possible key combinations that it is approximated to be the total number of atoms in the universe. Due to the additional security it provides and in accordance with the US NIST Post-Quantum Guidelines, a key length of 256 bits was preferred over that of 128 bits [18]. The need for a 256-bit key length, high performance, and authenticated encryption make AES-256-GCM an attractive candidate for design implementation.

Software vs. Hardware Encryption
To determine the feasibility of hardware encryption, I evaluated the performance and security of hardware  for AES-GCM.

Hash Functions
There are no existing hardware hash instruction sets which is why my analysis had to be conducted on a software level for hash functions. Primitives from the SHA2 family were chosen along with other algorithms like GHASH, all of which were evaluated on their throughput and collision resistance for performance and security, respectively.  with its current NIST classification that states it is considered deprecated for all practical implementations [19].
Even though it consistently had throughput 2-3 times the other hash algorithms, GHASH was also eliminated due to its relatively small digest size of 128 bits. This was expected since its most prevalent function is in AES-GCM for authentication.
Since the remaining hash algorithms had little variance (nearly negligible) to one another in their rate of hashing for all byte sizes, SHA-512 was selected due to its strong collision resistance (large output size of 512 bits is greater than the other digests).

Discussion
My data indicated that for software encryption, the most performant schemes were AES-256-GCM and AES-