Charlotte Times 46

collapse
Home / Daily News Analysis / Researchers build an encrypted routing layer for private AI inference

Researchers build an encrypted routing layer for private AI inference

Apr 21, 2026  Twila Rosenbaum  38 views
Researchers build an encrypted routing layer for private AI inference

Organizations in sectors such as healthcare and finance are increasingly seeking ways to utilize large AI models while keeping sensitive data secure from cloud servers. A breakthrough in this area comes from the use of a cryptographic method known as Secure Multi-Party Computation (MPC). This technique allows data to be divided into encrypted segments that are distributed across multiple servers, ensuring that no single server can access the raw data while still enabling them to compute AI results.

However, a significant challenge with this approach is the speed of processing. For example, a mid-sized language model that typically produces results in under a second can take over 60 seconds when processed using MPC due to the substantial encryption overhead involved.

Limitations of Existing Solutions

Previous advancements in private inference have largely focused on modifying AI models to reduce the computational costs associated with encryption. While these modifications have been beneficial, they share a common limitation: every query, no matter its complexity, incurs the same processing cost through the model.

In traditional AI applications, a common strategy is to direct simpler queries to smaller, faster models, reserving larger, more resource-intensive models for complex queries. This routing practice, standard in non-encrypted systems, poses a challenge in encrypted environments where routing decisions typically require access to the encrypted input.

Introducing SecureRouter

Researchers from the University of Central Florida have developed a system called SecureRouter, which introduces input-adaptive routing for encrypted AI inference. This system maintains a diverse pool of models ranging from a compact model with approximately 4.4 million parameters to a larger one with around 340 million parameters. A lightweight routing component evaluates each incoming encrypted query and selects the most appropriate model to handle it, all while keeping the entire process encrypted. The routing decision remains confidential and is never revealed in plaintext.

The routing mechanism is designed to balance accuracy and computational cost, with cost defined in terms of encrypted execution time instead of the parameter counts typically referenced in non-encrypted systems. The system also employs a load-balancing strategy to ensure that no single model is overwhelmed by requests.

Performance Improvements

When tested against SecFormer, a private inference system that relies on a fixed large model, SecureRouter demonstrated a significant improvement in average inference time, achieving a reduction of 1.95 times across five language understanding tasks. The speed enhancements varied from a 1.83 times improvement on the most challenging task to 2.19 times on the simplest task, showcasing the router's ability to align model size with query complexity.

In comparison to using a large model for every query, SecureRouter delivered an average speed boost of 1.53 times across eight benchmark tasks, with accuracy on most tasks remaining within a narrow margin of the large-model baseline. However, it was noted that one particular task involving grammatical analysis experienced a more substantial drop in accuracy, indicating that some specialized tasks may be more sensitive to the model size used.

Minimal Overhead

Integrating a routing layer into an encrypted inference system could potentially create a bottleneck; however, in practice, the routing component requires approximately 39 MB of memory in a two-server configuration, which is comparable to the 38 MB needed for the smallest model in the pool. The largest model demands around 3,100 MB of memory. The router adds an estimated 4 seconds to the inference time and contributes 1.86 GB of network communication, figures that are similar to running the smallest model independently.

Practical Implications

Importantly, SecureRouter does not necessitate any overhaul of existing infrastructure. It operates on top of current MPC frameworks and utilizes standard language model architectures that are readily available through widely used libraries. Simple queries are processed rapidly using smaller models, while more complex queries are escalated to larger models. Clients submitting queries receive only the final results without any insight into which model processed their request.


Source: Help Net Security News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy