FHE Solutions

Using the FHE Application Roadmap: How We Built an Encrypted Anomaly Detector

How Niobium built a fully encrypted network intrusion detector on the Fog. A step-by-step walkthrough of applying the FHE application checklist to a real, production-style workload.

Published on

Apr 29, 2026

In our previous post, we laid out the checklist for building an FHE application that uses fully homomorphic encryption. In this post we outline how we used that same checklist ourselves on a real application.

As we build our template applications for the Fog™, Niobium’s encrypted cloud platform, we want these examples to do two things at once. First, to prove that FHE can carry a meaningful, production-style application end-to-end. Second, to give others a concrete starting point. Something to fork, to adapt, and to use with their own data, so that they don’t need to build from scratch.

Anomaly detection is an obvious place to start. Structurally, it is an ideal FHE application: feed-forward, dominated by multiply-add, almost entirely free of data-dependent branching, and naturally batch-parallel across very large numbers of independent samples. And it shows up everywhere in practice—fraud scoring, industrial fault detection, medical outlier analysis, insider-threat monitoring, supply-chain signals, customer churn, and plenty more. We picked network intrusion detection as the specific instance to build because Kitsune^[1], the reference model for this application, is well-studied, well-understood, and public. The same approach generalizes: if your business can benefit from finding the unusual needle in a very large haystack of sensitive observations, the shape of what we built here is likely close to the shape of what you would need.

The result is Niobium’s network intrusion detection template application for the Fog. It reproduces the Kitsune anomaly detection framework end-to-end under encryption. What follows is how we walked the checklist to get there.

Start with the Right Architecture

First we verified that the application had the right shape. FHE is fundamentally a client-to-“blind server” architecture. The client encrypts and sends to the server, the server computes on ciphertexts without ever seeing a plaintext, and the result comes back to the client to be decrypted. Anomaly detection fits that mold perfectly. It uses strictly feed-forward arithmetic: packet features flow into the front and anomaly scores fall out the back.

Next we considered how to train the model. Anomaly detection only works if you have a model of “normal” to compare against, and that model has to be on the server before any analysis can happen. We settled on the cleanest privacy story available: the client trains the model in the clear on its own premises, and then ships it to the server. From that point on, every packet feature batch is encrypted. The server never sees any network traffic, and neither does the attacker who may have compromised that server. Only the client, with the secret key, can ever see the input traffic or interpret an anomaly score.

This application is built to run on The Fog. The reasons are the ones that apply to any managed cloud platform: reliability, elastic capacity that scales with analysis volume, and offloading infrastructure management and production DevOps from the customer to the platform. The main reason, though, is custom hardware. For the moment, the Fog is how Niobium’s FHE accelerators reach customers, and that acceleration is what takes the application from CPU-bound throughput to production.

Get It Working in Plaintext First

The next step was unglamorous but non-negotiable: get the whole thing working without any encryption at all. Kitsune’s stock implementation runs training directly into inference, with everything held in memory, and that does not match the shape of an FHE deployment. We restructured the application so that training produced a stored model file, client-side feature extraction was cleanly separated from server-side detection, and the anomaly detector operated on batches of data in the same order of operations the FHE version would later require.

This plaintext rewrite became our ground truth. Every subsequent transformation was tested against it. Without a trusted reference to compare against at each step, it is effectively impossible to tell a restructuring bug from an FHE bug once you’re working with computed ciphertexts.

Remove Data-Dependent Control Flow

FHE forbids data-dependent control flow (i.e. if x > 5…then) but fully supports data-dependent data flow (i.e. y = x[i]). The sequence of instructions that the server executes has to be identical for every possible input, because that sequence is observable. It is only the data flowing through those instructions that is hidden.

Kitsune is an ensemble of autoencoders, followed by a final anomaly detector. The core computation is linear combinations through autoencoder weights, followed by activation functions, followed by a reconstruction error calculation. Almost no branching appears anywhere. The one place we had to intervene was Kitsune’s “learned” architecture. By default, the Kitsune model chooses how many autoencoders to run and how features get grouped among them based on the training data. We changed the architecture to always use five autoencoders, each with a predetermined assignment of 10 features, for a total of 50 features. The server application now runs exactly the same computation regardless of input. This also makes the application run meaningfully faster than the dynamic version it replaces, which is a valuable bonus for any FHE application.

Understand Multiplicative Depth

Every multiplication between ciphertexts consumes a level of the noise budget baked into each ciphertext at encryption time. Chain enough multiplications together, and the noise overwhelms the signal, making the decrypted result unintelligible. Worse, FHE gives no runtime indication that this has happened. You only find out when you compare decrypted output against a known reference and the numbers don’t make sense.

We counted carefully. Each autoencoder pass contributes two activation function evaluations: one at the hidden layer, one at the reconstruction. The final anomaly detector adds its own activations plus a squaring step to compute the mean squared error. Each polynomial approximation of an activation function costs several levels of depth by itself. Our estimate came in at around 22 levels of multiplicative depth end-to-end. This meant that we could comfortably avoid bootstrapping, the expensive operation that refreshes the noise budget in-flight. Bootstrapping is sometimes unavoidable, particularly in deep neural networks, but for this application we were able to design around it entirely.

Approximate Your Non-Linear Functions

Kitsune’s autoencoders use sigmoid functions internally, and the final anomaly detector uses the tanh function. Neither has a direct FHE equivalent, since FHE natively supports only addition and multiplication. We replaced both with Chebyshev polynomial approximations, chosen over Taylor series approximations because Chebyshev behaves much more consistently and uniformly as long as the inputs stay within a known range.

Accuracy only has to be good enough to preserve the qualitative meaning of the anomaly scores. Analysis of Kitsune on representative datasets showed that the inputs to the autoencoder sigmoid functions stay reliably within ±5, and the inputs to the anomaly detector’s tanh stay within ±2. A fifth-order Chebyshev fit handles both ranges well. We decided to store approximation coefficients into the model file with the weights, so the server application could evaluate them directly.

The wrinkle with polynomial approximations is that their outputs diverge dramatically outside their valid input range. In order to ensure a bounded input range for our approximation functions (between -1 and 1), we added a sigmoid-based normalization parameterized by the mean and variance of each feature measured during training.

Constrain Data Types and Precision

FHE operates on integer data, so every input feature had to be mapped from floating-point to fixed-point values. Packet lengths fit in 14 bits, and inter-packet timing differences can be represented comfortably in 12. A 16-bit fixed-point representation was a reasonable starting point for both. After normalization into −1 to 1, every value in the system is a fixed-point real number.

That precision analysis also confirmed the scheme choice. We’re computing real-valued anomaly scores and can tolerate small approximation errors in the output, which is exactly what CKKS is designed for, making it the natural choice. If we had needed exact integer arithmetic, the kind that is usually required for keyword-search for instance, BFV or BGV would have been the better fit.

Select Scheme, Parameters, and SIMD Strategy Together

CKKS’s ciphertext packing is a significant advantage for this application. A single ciphertext can hold thousands of values in its slots, and a single homomorphic operation on that ciphertext processes all of them in parallel at no additional cost – what’s known as single instruction multiple data (SIMD). A production network generates a very large number of independent packets, which is a perfect SIMD scenario.

We selected a ring dimension of 2¹⁶ (64K), giving us 32K usable CKKS slots meaning that we can process up to 32,768 packets per batch, which corresponds to roughly 20 minutes of a typical user’s network activity. The packing strategy we chose is one feature per ciphertext: each of the 50 features gets its own ciphertext, and the slots of that ciphertext hold the value of that feature across all 32K packets in the batch.

The parameters follow from the depth and security requirements. Supporting 22 levels of multiplicative depth at 128-bit security (the industry standard) makes each ciphertext roughly 24 megabytes. Fifty ciphertexts per batch works out to about 1.2 gigabytes of encrypted data per 20-minute window. On a server, keeping 50 feature ciphertexts live plus working storage causes memory to peak around 3 gigabytes during a batch. Neither number is small, but both are workable for a prototype, and both are natural targets for optimization for a production environment.

Choose Your Library

OpenFHE was the natural choice. It supports CKKS, it is actively maintained, it is open source, and it has seen significantly more testing by the user community than any of the plausible alternatives for this scheme family. Out of the box, OpenFHE runs on CPUs, and GPU support can be added with a light additional lift. Because Niobium’s mistic™ compiler stack is integrated with OpenFHE, we also get faster-than-GPU hardware acceleration on the Fog without modifying the application at all.

Build and Test the FHE Version

With our parameters and library chosen we were ready to walk through the cryptographic steps. The client generates the cryptographic context: the CKKS parameters, the key pair, and the evaluation key(s) that the server needs to handle ciphertext multiplications (in this case a single relinearization key). The context and the relinearization key are serialized and shipped to the server as part of an initial handshake. The relinearization key doesn’t allow the server to decrypt anything, it is, and can only be, used to keep ciphertexts in a usable form after each multiply.

At runtime, the client samples network traffic just like any other network probe. It then extracts the 50 features per packet, normalizes them, batches 32K packets, and encrypts each feature batch into its own ciphertext. The 50 ciphertexts go to the server, which runs the full KitNet ensemble under encryption: linear combinations through the autoencoder weights, Chebyshev-approximated activations, reconstruction, residuals, then the final anomaly detector with its own activations and a squaring step for mean squared error. The output is a single score ciphertext with one anomaly score for each of the 32K packets in the batch. That ciphertext is returned to the client, which decrypts it and interprets the scores.

Validation is a direct comparison of the decrypted FHE scores with the plaintext reference scores from Step 2. Discrepancies usually mean one of three things: the noise budget is exhausted (multiplicative depth is too high), the precision is insufficient (the approximation error is too high), or a non-linear operation was missed somewhere in the application. Debugging encrypted programs is genuinely harder than debugging plaintext ones, because every intermediate value is unreadable by design. The practical approach is to output and decrypt intermediate values and compare them to the corresponding values in the plaintext reference. This is tedious but unavoidable. This comparative debugging is exactly why we went through the trouble of building the application in plaintext first.

Profile and Iterate

The first working encrypted version is a milestone, not the finish line. We measured wall-clock runtime and peak memory, and the numbers look like FHE numbers on CPU and GPU: memory footprints in gigabytes, runtimes orders of magnitude slower than the plaintext reference. That is normal, and it is where the iteration loop starts: can we shave a level of depth, pack data more densely, reduce precision, or rearrange the circuit to shorten the critical multiplicative path?

The other half of the performance story is hardware. The overheads that look startling on CPU and GPU become practical when running on Niobium’s FHE accelerators and in the Fog.

Network intrusion detection is one of the template applications coming to the Fog™, Niobium’s encrypted cloud platform. Sign up for early access or contact us if you’d like our help adapting your own application to FHE.

References

[1] Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. (2018). Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. Network and Distributed System Security Symposium (NDSS) 2018. arXiv:1802.09089.

Dr. Archer is a Principal Scientist leading Cryptography & Multiparty Computation for Galois, Inc., with customers including DARPA, the NSA, IARPA, and the Department of Homeland Security. Dr. Archer has over 30 years of R&D experience in complex ASICs, system hardware, software architectures, secure computation, and cryptography. Dave holds a PhD in Computer Science from Portland State University, an MS in Electrical Engineering, and a BS in Computer Engineering from the University of Illinois at Urbana-Champaign.