All public key cryptosystems, though being highly secure, have a common
drawback: They require heavy computational effort. This is due to the reliance
on modular multiplication of large operands (1024 bits or higher). The same
problem arises in data encryption/decryption and digital signature schemes.
Examples of such cryptosystems are RSA, DSA, and ECC. Now considering embedded
platforms for applications of smart cards and smart tokens, the overall time
performance of the cipher system becomes very slow. This refers to the limited
computational power of the embedded processors. This paper introduces an
enhanced architecture for computing the modular multiplication of large operands
in ECC and modular exponent for RSA. The proposed design can act as a
co-processor for embedded general purpose CPUs. The proposed design is tested
with ATMEL and NIOS microcontrollers with/without the proposed accelerator.
Another test is performed using dual core architectures to speed up the system
One common drawback of all public key cryptographic algorithms is the
heavy computation involved in key generation, digital signature, and data
encryption/decryption schemes. Such complexity refers to the use of modular
exponentiation in most of the above schemes, taking into account that operands
are not less than 1024 bits long (except for Elliptic Curve Cryptography (ECC)
which uses modular multiplication of 521 bits numbers as maximum). Modular
multiplication is the heart of modular exponentiation. Accelerating modular
multiplication will raise the efficiency of the whole public key cryptosystem.
General purpose processors consume thousands of cycles to finish a single
operation using classical methods. Many algorithms have been developed to
efficiently perform the computation (X × Y mod M) without doing the ordinary
pencil-and-paper steps. Examples are Montgomery modular multiplication and
interleaved modular multiplication. Now considering embedded environments, such
as FPGAs or ASICs, we can achieve higher efficiency by selecting one of the
above algorithms to run over special purpose hardware architecture. In this
work, an enhanced architecture for an interleaved modular multiplier is
introduced. An implementation for Montgomery modular multiplication is
introduced as well. Another important accelerator must be implemented in any
cryptographic accelerator: a random number generator that is used to generate
private keys for PKI algorithms. Considering the Softlock cryptographic SDK, AES
is the main operation in the ANSI X9.31-based random number generator, so we
also implement an AES accelerator in the proposed core.
The main parts in
this paper can be classified as follows:
- Core accelerator Interface.
- Performance result of the accelerator.
- Enhancement achieve in dual core processor architecture with our