Secure usage of the Cryptographic Library

A quality version of this page, approved on 20 May 2021, was based off this revision.

This page gives some hints, tips and methods for the use of the Cryptographic Library in a more secure manner.

1 Cache-timing attack resistance

Timing attacks

Timing attacks specifically leverage differences in computational timings, in order to retrieve information on the private data used during the execution a cryptographic algorithm. Vulnerability to this class of attack is mainly due to:

Conditional branches in the code, involving secret data
Low-level machine instruction usage, the computational time of which depends on input data
Compiler optimizations

Regular code execution

The solutions to avoid timing attacks are as follows:

Develop regular code, running for the same time duration, independently from the input secret data.
Break the correlation between secret input data and computational timing by randomizing the secret information, such that an attacker cannot recover the secret data from the retrieved randomized data.

Cache-timing attack (CTA)

Data leakage might occur even in a regular implementation, because of the presence of caches between the CPU and memory.

Caches are fast memories that are used to store frequently-used data; data loaded from the memory is stored in the cache before being passed to the CPU, and kept in the cache so that it can be quickly reloaded the next time it is requested.

Therefore, if data is in cache, the load is faster, whereas if the data is not in cache, the loading time is longer because information must be retrieved from main memory.

CTA exploits this property to steal secret data from regular algorithm implementations, where secret data is used as an index to access a memory location (for example a table).

Countermeasures against CTA

To avoid CTA, a real constant-time implementation must be developed.

In addition to regular code, correct management of the cache flow must always be considered.

Full parsing of tables

A possible countermeasure against these attacks is to avoid directly accessing the table using the secret data, but instead perform a loop-and-load on every single memory location in the table, and conditionally keep only the desired data.

The conditional test to check whether the current iteration is the desired one must be performed with regular code in order to avoid branches on secret data.

In this way, the attacker cannot determine the secret index, because every possible index is used and every memory location is loaded, leading to a constant computation time.

The side effect of this countermeasure is a slight increase in computation time, due to the need of load every element of the table, instead of only the required value.

Use non-cacheable memories

Another solution to address caches vulnerability is to avoid the use of caches; often only Flash memory is affected by cache. Sometimes however, faster memory than FLASH/SRAM is present in microcontroller, and this is not served by cache.

If these memories are big enough to keep the implementation tables, there is a double advantage in using:

no vulnerability to CTA;
faster memory loading, resulting in faster computational timing.

How to enable CTA resistance in the cryptographic library

The cryptographic library addresses CTA with constant-time code and uses non-cacheable memories.

The library already has linker labels associated to CTA-sensitive internal tables (usually residing in Flash memory); the user only needs to map objects labeled with the label “CMOX_CTA_PROTECTED_DATA” onto the desired non-cacheable memory.

The example below shows how to create a linker script:

IAR Embedded Workbench: (Linker configuration file, .icf)

initialize by copy { readwrite, section CMOX_CTA_PROTECTED_DATA};
...
define region SAFEMEM_region  = mem:[from 0x20030000   to 0x20034000];
...
place in SAFEMEM_region { section CMOX_CTA_PROTECTED_DATA };

Arm® KEIL µVision® IDE (Scatter file, .sct):
LR_IROM1 0x08000000 0x00200000 {   ; load region size_region
  ...
  SAFEMEM 0x20030000 0x00004000 {
    *(CMOX_CTA_PROTECTED_DATA)
  } 
  ...
}
Information

Note: Arm® is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

STM32CubeIDE (Linker script, .ld)

SECTIONS
{
  ...
  user_SAFEMEM 0x20030000:
  {
    . = ALIGN(8);
    *(CMOX_CTA_PROTECTED_DATA)
    . = ALIGN(8);
  }
  ...
}

In the linker file, add the highlighted lines, checking in your microcontroller manual for the correct non-cacheable memory start address and size. Note that the label SAFEMEM is a simple string decided by the user, and is not a specific value selected among a set of predefined values.

Depending on the microcontroller used, two scenarios could arise:

Only Flash memory is affected by cache (for example ART caches): in this case no further operations are required to secure the implementation against CTA;
SRAM is also affected by cache (for example L1 cache on STM32F7/H7). In this case, for ECC/RSA modules only where a temporary buffer is requested for internal computations, the user must also declare the temporary buffer as follows:

  CMOX_CTA_RESISTANT uint8_t membuf[3000];

instead of the standard way:

  uint8_t membuf[3000];

In order to use this macro, it is necessary to include the cmox_cta.h file, located in the root of the include folder.

2 Robust comparison (against simple faults)

Fault attacks

Faults events can occur on electronic devices. These have several possible causes, such as too over/under temperature, unsupported supply voltage or current, strong electric or magnetic fields, and so on.

Fault attacks cause intentional faults on the device in order to influence the operation of the processor, with the purpose of retrieving secret information involved in the computation.

Simple faults

Simple fault attacks are those where the attack scenario comprises a single fault, with limited possibilities:

jump one or more consecutive instructions
set a register (or a memory location) value to 0 or 0xFF…FFF;
flip every (or some) bits in a register (or a memory location).

What is robust comparison

Robust comparison is a strong way to check whether or not two buffers are equal, returning the appropriate comparison result (success or failure).

For example, even though a signature verification might be securely implemented, an attacker could simply fault the final buffer comparison (or the comparison final result) in order to change the verification response to success.

How to enable robust comparison in the Cryptographic Library

To circumvent the above, a robust comparison is implemented, and every verification function in the library has a specific parameter (called P_pFaultCheck) to manage this.

The parameter is an optional pointer which can be NULL, if robustness against simple faults is not needed in the current scenario.

If instead, the parameter is provided to the verification function, the routine robustly checks against simple faults, returning not only the standard return value, but also filling the parameter variable with a second return value. This second return value has to be checked against the main return value:

If the main return value is FAIL, this means that the function correctly failed the verification, therefore there is no need to check the second return value.
If the main return value is SUCCESS, this it means that the function correctly verified the inputs, or it did not detect a fault in the verification process. For this reason, it is necessary to check the second return value (the one inside the P_pFaultCheck parameter), by comparing it against the first return value:
1. If it also has SUCCESS value, this means that the verification was really successful.
2. otherwise, if it is different from the SUCCESS value, a fault was detected and the verification failed. Note that in the case of a fault, the second return value could be a dirty value, therefore the user must consider every value different from SUCCESS as a failure.

This robust comparison alone, however, is not sufficient to protect against simple faults.

Take, for example, the code below:

…
retval = cmox_rsa_pkcs1v15_verify(&rsa_ctx,
                                  pub_key,
                                  digest,
                                  CMOX_RSA_PKCS1V15_HASH_SHA256,
                                  signature,
                                  sizeof(signature),
                                  &fault_check);
if ((retval == CMOX_RSA_SUCCESS) && (fault_check == CMOX_RSA_SUCCESS))
{
  return OK;
}
else
{
  return FAIL;
}

Here, an attacker could simply fault the conditional branch after the verification, in order to not take the failure branch, but instead take the success branch.

For this reason it’s important to securely check the two results of the robust verification process, as for example in the code below:

…
retval = cmox_rsa_pkcs1v15_verify(&rsa_ctx,
                                  pub_key,
                                  digest,
                                  CMOX_RSA_PKCS1V15_HASH_SHA256,
                                  signature,
                                  sizeof(signature),
                                  &fault_check);
if (retval == CMOX_RSA_SUCCESS)
{
  /* TODO: perform operations necessary to the final result of the developing functions */
  …
  if (fault_check == CMOX_RSA_SUCCESS)
  {
    return OK;
  }
  else
  {
     /* TODO: rollback previous operations computed after the first successful branch */
    …
    return FAIL;
  }
}
else
{
  return FAIL;
}

In this way an attacker could try to fault the program in order to directly jump to the OK return, but this avoids the execution of the necessary operations (marked by the comment), and therefore the computation results in some error or segmentation fault, thwarting the successful attack.

With this new code, the attacker is forced to fault the computation twice, in the exact positions corresponding to the two branches, but this is beyond the scope of simple faults protections.

3 RSA countermeasures for Bellcore attack

Bellcore attack

The Bellcore attack is a fault attack that aims to recover the RSA private exponent by faulting one or more RSA computations (more specifically, the internal modular exponentiation), depending on the algorithm used.

To recover the private exponent of an RSA using CRT (Chinese remainder theorem) the attacker needs to fault just one operation, while for a standard RSA implementation, the attacker needs more faulted operations, all performed with certain criteria.

Therefore, in the cryptographic library it has been decided to protect the CRT implementation only.

Private operations in RSA are used both for decryption and signature creation. Since the Bellcore attack needs the output of the faulted operation, and a faulted decryption results in wrong padding causing the output to be filled with zeros, the RSA decryption processes do not need to be protected as the attacker is unable to get the faulted output.

Moreover, because of the algorithm itself, RSA PKCS#1 v2.2 is not vulnerable against the Bellcore attack, therefore only the RSA PKCS#1 v1.5 signature generation routine needs to be protected.

Countermeasure

For signature operations the countermeasure is simple; only verification of the signature validity is required before outputting it.

Signature verification (through the public key) is a very cheap operation with respect to the signature generation process, which therefore results in only a slight increase in the computation time.

For decryption, there is no high-level countermeasure, as the re-encryption of the decrypted message is non-deterministic (new random is used) and the values differ. However, as previously mentioned, since a faulted decryption would probably not have valid padding, the faulted decrypted message is not available to the attacker, thus negating the attack.

How to enable Bellcore attack countermeasures in the cryptographic library

To enable the countermeasure, instead of calling the standard way to set the CRT key:

cmox_rsa_retval_t cmox_rsa_setKeyCRT(cmox_rsa_key_t *P_pPrivKey,
                                     size_t         P_ModulusBitLen,
                                     const uint8_t  *P_pExpP,
                                     size_t         P_ExpPLen,
                                     const uint8_t  *P_pExpQ,
                                     size_t         P_ExpQLen,
                                     const uint8_t  *P_pP,
                                     size_t         P_PLen,
                                     const uint8_t  *P_pQ,
                                     size_t         P_QLen,
                                     const uint8_t  *P_pIq,
                                     size_t         P_IqLen);

the user needs to call the following function:

cmox_rsa_retval_t cmox_rsa_setKeyCRTwithFACM(cmox_rsa_key_t *P_pPrivKey,
                                             size_t         P_ModulusBitLen,
                                             const uint8_t  *P_pExpP,
                                             size_t         P_ExpPLen,
                                             const uint8_t  *P_pExpQ,
                                             size_t         P_ExpQLen,
                                             const uint8_t  *P_pP,
                                             size_t         P_PLen,
                                             const uint8_t  *P_pQ,
                                             size_t         P_QLen,
                                             const uint8_t  *P_pIq,
                                             size_t         P_IqLen,
                                             const uint8_t  *P_pPubKey,
                                             size_t         P_PubKeyLen);

(FACM stands for Fault attack countermeasures.) This requests the user to also provide the public key, in addition to the CRT private fields.

This call enables the countermeasures during the subsequent RSA computations, whose function calls remain the same.

4 Secure usage of random material

Why random material is so important

The strength of the random number generator used by a security system is rarely communicated, power consumption and bit generation speed being considered more important than the randomness of the generated bits.

This information is important however, considering that in most cryptographic systems, the quality of the random numbers used directly determines the security strength of the system.

The Kerckhoff principle states that the security of the system must depend solely on the key material, and not on the design of the system. This means that all modern security algorithms and protocols have their cryptographic strength expressed in the number of (key) bits that an attacker needs to guess before they can break the system (the actual definition of cryptographic strength is a measure of the complexity of a computational attack against the algorithm, often expressed as the log₂ of the number of times attacker must invoke the algorithm before learning the value of the key used).

However, even if the key material for a cryptographic algorithm is not directly exposed, its generation from a bad source of randomness can compromise the overall security of the system. If there exists some sort of correlation among the “random” generated bits, an attacker can use statistical methods for deriving part or the whole key.

Some organizations, such as NIST, have defined statistical tests to be run on the output of a random number generator. Those tests should provide answers to some important problems, like determining how much unpredictability the random source generates, or how this unpredictability can be translated into random bits. The challenge is, however, that statistical tests cannot really prove that a source is random; at best, these tests can determine that a source is not, or is only partially, unpredictable.

It is in fact highly possible to create a module that creates a bit string that passes all the available statistical tests, but still generates a completely predictable bit stream, once the internal state of the module is known.

For this reason, standardization and certification organizations tend to require that a random number generator design is not tested by passing its output through a statistical test; rather, these organizations require that the design of the generator is reviewed, and its randomness generation properties explained and proven.

TRNG vs DRBG

A true random number generator (TRNG) is a device that generates random numbers from a physical process, and not by means of an algorithm. These devices are based on microscopic phenomena generating low-level, statistically random signals, such as thermal noise, photoelectric effect and other quantum phenomena. These stochastic processes are, in theory, completely unpredictable, and the theory's assertions of unpredictability are subject to experimental test.

A deterministic random bit generator (DRBG), or pseudorandom number generator (PRNG), is an algorithm that generates a sequence of numbers whose properties approximate those of sequences of true random numbers. The DRBG-generated sequence is not truly random, because it is completely determined by an initial value, called the DRBG's seed or entropy seed.

Although TRNGs are more secure than DRBGs, the latter are very important for several reasons:

As long as a DRBG’s internal state is unknown to an attacker, the output it generates is unpredictable by the attacker.
A well-constructed DRBG only leaks a limited amount of data about its internal state through the output it produces. Therefore, as long as the DRBG’s internal state is refreshed by new unpredictable data (that is, the DRBG is reseeded) often enough, its output remains unpredictable.
Once a DRBG is properly seeded, it is extremely fast in terms of unpredictable output generation, compared to a TRNG.

How to get entropy from the STM32

TRNG peripheral

Most STM32 microcontrollers include a TRNG peripheral, and the HAL/LL drivers provide an API to generate true random numbers, 32 bits at a time. Please refer to the driver documentation.

Entropy without a TRNG

When dealing with security in microcontrollers, it is always preferable to choose a device that includes a TRNG peripheral. However, not all the available STM32 MCUs provide such peripherals and, in this case, it is possible to exploit some sources of randomness from other components of the microcontroller.

As an example, all STM32 MCUs provide two internal oscillators that can be used as clock sources: the HSI (high- speed internal) and the LSI (low-speed internal) oscillators. These two clock sources are not as accurate as crystal oscillators, in particular at startup time. Furthermore, the two sources are independent of each other and have different frequencies (for example in the STM32F401 the HSI frequency is ~16 MHz, while the LSI frequency is ~32 kHz, varying from 17 to 47). This can be exploited for generating some sort of randomness that can be used to seed a DRBG.

In the example shown above, a timer peripheral is configured with the HSI as clock source, and it is used to measure the LSI periods. In this way it is possible to check how many times an HSI period is included into an LSI one, and since both provide some uncertainty, the least significant bits of each measure are different to each other. For best results follow these recommendartions:

Restart the LSI oscillator at each measurement. This is because after the startup phase the oscillator becomes stable at a certain frequency. Restarting it changes the new stable frequency.
Increase the timer clock source using the PLL together with the HSI oscillator. In this way the randomness measure from the LSI is more precise.
It is possible to increase the timer prescaler in order to generate better random bits, at the cost of increasing the required time. In the above picture, the prescaler is set to 8, meaning that any measure takes 8 LSI periods.
The best result is reached if, from any measure, only the least significant bit (LSB) is extracted. This means that in order to generate N random bits, N measurements are needed.

Information

Note: The proposed solution is not theoretically sound and its usage is intended only in cases where the TRNG is not available in the MCU. Furthermore, the usage of this solution is discouraged for critical security or cryptographic purposes.

How to securely initialize the DRBG

Deterministic random bit generators (DRBGs) are based on cryptographic operations; for example, the ctrDRBG algorithm is based on the AES block cipher.

Before the DRBG is used for generate random material, it must be initialized with a seed. This operation determines the initial state of the DRBG.

It is also possible to periodically reseed the DRBG in order to make the random material generation stronger over time.

The initial seed is comprises three fields:

An entropy input that must be generated by a reliable source of randomness (for example TRNG), and must be kept secret, otherwise the security of the DRBG is compromised. The security of the DRBG is highly dependent on the security strength of the randomness source used for the entropy generation.
A nonce, the value of which is not expected to repeat during new instantiations. It does not need to be secret.
A personalization string that is optional but recommended, which is used to integrate some additional information into the instantiation of the DRBG. It does not need to be secret.

Some DRBG implementations provide a derivation function that is used internally to derive a more secure seed from the given seeding materials. This means that with these implementations, it is possible to provide a non full-entropy input that is larger than the requested security strength, without compromising the overall security of the DRBG.

The seed used for reseeding is based on:

the current internal state value of the DRBG
a new entropy input
an optional additional input

The cryptographic library provides an implementation of the ctrDRBG algorithm with a derivation function, which is specified in the NIST Special Publication 800-90A, together with other DRBG algorithms.

How to manage random material for ECC/RSA

The cryptographic library ECC/RSA APIs place on the user the responsibility for generating random bytes and providing them to the function call.

The random data may be generated not only with the library DRBG, but with a TRNG or other source preferred by the user.

Therefore, the cryptographic library functions ask the user for random bytes, and return a specific error if the provided data does not fulfil the requirements.

An effective way to manage random usage in the cryptographic library might be to implement a loop in which random data is generated and the cryptographic function invoked, continuing iterating until no more random-related errors are returned by the cryptographic function.

Example with RSA:

cmox_rsa_retval_t rv; /* CLv4 return value */
uint8_t random_data[256]; /* array that will be filled with random bytes */
/* generate random until it's valid for RSA encryption function */
do
{
  /* ToDo: fill 'random_data' with bytes generated by a specific source (DRBG, TRNG, etc.) */
  ...
  
  /* call the RSA encryption function */
  rv = cmox_rsa_pkcs1v15_encrypt(..., random_data, sizeof(random_data), ...);
}
while (rv == CMOX_RSA_ERR_WRONG_RANDOM);

Necessary random material for ECC API

Some ECC functionalities need random material in order to process.

In particular, the following services need a random-bytes vector as input, and this byte array needs to be at least long as the chosen curve N parameter byte length (for example, 32 bytes for SECP256R1, 66 bytes for SECP521R1):

ECDSA key generation
ECDSA signature generation
SM2 key generation
SM2 signature generation

Even if the random array has the correct length, a failure could occur with return value CMOX_ECC_ERR_WRONG_RANDOM. A security check is done inside the ECC functions, assuring not only that the byte length is correct, but also that the value is compatible with the chosen curve N parameter.

For NIST curves, that have the N parameter starting with 0xFFFFFFFF…, this scenario is highly improbable, while for other curves (for example brainpoolP384r1) the probability can almost reach 50%.

The solution to this is to generate a new random byte array and recall the cryptographic API.

Necessary random material for RSA API

Some RSA functionalities need random material in order to process.

In particular, the following services need a random byte vector as input:

RSA PKCS#1 v1.5 Encryption: The array must contain at least (ModulusLength - InputLength - 3) not NULL bytes (that is, 0x00).; Only that number of bytes is used if no NULL bytes (that is, 0x00) are in the sequence, otherwise more bytes are necessary to replace the NULL bytes.
RSA PKCS#1 v2.2 Signature Generation: The array length must be ≤ ModulusLength - HashLength - 2, where HashLength is the byte length of the digest generated by the chosen hash function.; All the given bytes are processed and used inside an internal hashing process.
RSA PKCS#1 v2.2 Encryption: The array length must be ≥ HashLength, where HashLength is the byte length of the digest generated by the chosen hash function.; Only HashLength bytes are used, while the others are discarded.

5 References

[TIMING-KOC96] P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, CRYPTO 1996

[FAULT-BDL97] D. Boneh, R. DeMillo, R. Lipton, “On the importance of checking cryptographic protocols for faults”, CRYPTO 1997

[BELLCORE-BDL01] D. Boneh, R. DeMillo, R. Lipton, “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology 14(2), 2001

[RND-RFC4086] Randomness Requirements for Security

[RND-NIST.SP.800-90B] Recommendation for the Entropy Sources Used for Random Bit Generation

User

Wiki

Categories

General Books

Secure usage of the Cryptographic Library

Contents

1 Cache-timing attack resistance

2 Robust comparison (against simple faults)

3 RSA countermeasures for Bellcore attack

4 Secure usage of random material

5 References

Manage page

Toolbox

QR code

Mode

Categories