Private AI on Oracle OCI: How to Stop Sending Company Data to Public LLMs

There is something that bothers me every time I see a company adopting AI tools without thinking too much about what happens to their data. You open ChatGPT, you paste some internal document, you ask for a summary or a rewrite, and that’s it. The data left the building. Where it goes, who reads it, if it is used to train future models; these are questions that most people don’t ask until it becomes a problem.

I work a lot with cloud infrastructure, and in the last year or so AI has become a big part of the conversations I have with customers and colleagues. Everyone wants AI, everyone wants LLMs, but not everyone thinks about the privacy side of things. So I want to talk about that.

The Real Problem with Public AI Services

Let me be clear: I am not saying ChatGPT or similar tools are evil. They are fantastic products and I use them myself for personal things. But there is a difference between using a public AI for writing a birthday message and using it inside a company workflow.

When you use a public AI service, your prompts and data go somewhere. Depending on how the service is configured and what plan you have, your data might be used to improve or train future models. Even if you trust the vendor today, you have no real control. You are sending potentially confidential information like customer data, contracts, financial reports, internal strategy documents etc..

For many companies, especially those dealing with regulated industries, this is simply not acceptable. GDPR compliance in Europe makes it even more complicated. If personal data is involved, sending it to a third party AI without proper data processing agreements is already a violation.

What is the Alternative

The good news is that you don’t have to choose between “use public AI and lose control” or “don’t use AI at all.” There is a middle path, and Oracle OCI Generative AI is one of the better options I have seen.

Oracle provides ready-to-use LLM models directly on their cloud infrastructure. You consume them through API calls, on-demand, without needing to build or manage your own GPU cluster. But the key point, and this is what makes it interesting for privacy, is that your data does not leave your tenancy, and Oracle does not use your prompts to train their models.

This is a very different situation from using a consumer AI product. You are essentially renting compute time on top of foundation models, but the inference happens in an isolated environment that belongs to your OCI account. No one else sees your data.

Oracle is quite explicit about this. The service is designed for enterprise use, and “zero data retention” endpoints are part of the offering meaning the data processed during inference is not stored or reused.

The Models Available on OCI Today

One thing I appreciate about OCI Generative AI is the variety of models available. You are not locked into one vendor or one model family. At the time I write this, the lineup includes models from several well-known providers, all accessible through the same API:

Cohere models

Cohere Command A, the current flagship, very good for enterprise tasks, agents, and RAG workflows. 256K context window.
Cohere Command A Reasoning, same family but focused on multi-step reasoning and complex analysis
Cohere Command A Vision, multimodal, understands images and documents together with text
Cohere Command R+, solid choices for RAG and question answering

Meta Llama models

Llama 4 Maverick and Llama 4 Scout, the newest generation, Mixture of Experts architecture, multimodal
Llama 3.3 70B, very capable, good balance between quality and cost, available on-demand
Llama 3.2 90B Vision and 11B Vision, multimodal, text plus images
Llama 3.1 405B, the big one, best for complex tasks

OpenAI gpt-oss models

gpt-oss-120b, the model I used in my previous post about OpenClaw. Hosted by Oracle, managed, on-demand.
gpt-oss-20b, lighter version, faster and cheaper

Google Gemini models

Gemini 2.5 Pro, advanced reasoning, multimodal
Gemini 2.5 Flash, faster, good balance
Gemini 2.5 Flash-Lite, optimized for high-volume, cost-sensitive workloads

xAI Grok models

Grok 4.1 Fast
Grok Code Fast 1, specifically focused on coding tasks and agentic coding workflows

This is a lot of choice. On-demand models require no provisioning, you just call the API and pay per token. Dedicated clusters are also available if you need guaranteed capacity or want to fine-tune a model with your own data.

How the Architecture Works and How to Build it.

In a previous post I described how to connect OpenClaw to OCI LLMs using LiteLLM (available as well as container selfhosted) as a proxy layer. The same exact approach I used later to connect Lobechat to OCI, and the logic is the same in both cases.

The idea is simple: you have a frontend chat application that only knows how to speak OpenAI-compatible API. You don’t want to touch the frontend code or write custom integrations for every possible AI provider. So you put LiteLLM in the middle. LiteLLM is an open source proxy that translates between the OpenAI API format and whatever backend you are actually using, in this case, Oracle OCI.

The flow looks like this:

Lobechat → LiteLLM Proxy → Oracle OCI Generative AI

In practical terms:

OpenClaw or Lobechat sends a standard OpenAI-style request to LiteLLM, running on a VM inside your infrastructure
LiteLLM picks up the request, authenticates against OCI using the credentials in ~/.oci/config, and forwards the request to the right model endpoint
OCI processes the inference and sends back the response
LiteLLM passes it back to the frontend

From the frontend’s perspective, it is just talking to a local OpenAI endpoint. It doesn’t know or care that Oracle is behind it. This abstraction is very useful because you can switch models, add new ones, or mix different providers without touching your frontend configuration.

LiteLLM also gives you API key management, which means you can create different keys for different users or teams, set rate limits, track usage, and enable or disable access to specific models. This is very handy when you are deploying this inside a company.

A Real Company Scenario

Let me describe a concrete example of how this can work inside an organization.

Imagine a company with 50 employees. Some of them use AI daily, marketing for writing, developers for code review, HR for drafting job descriptions, finance for summarizing reports. Today, many of them probably have personal ChatGPT accounts, or maybe the company pays for a team subscription to some public service.

Now think about what happens in that scenario. An HR manager pastes a CV with personal data into ChatGPT. A developer pastes internal code that has customer logic inside. A finance person sends a draft contract for review. Every single one of these actions is a potential data leak, and most of the time nobody even realizes it.

With the architecture I described, you set up:

A small VM in OCI with LiteLLM running as a service
OCI credentials configured for access to the Generative AI service
A chat frontend. Lobechat (selfhosted) works very well, it has a clean interface and supports custom API endpoints out of the box
You configure Lobechat to point to your LiteLLM proxy instead of OpenAI

Now every employee in your company gets access to a chat interface that looks and feels like ChatGPT but connects to OCI behind the scenes. You can expose it on the internal network only, or put it behind a VPN. The data never leaves your OCI tenancy. You choose which models employees can access. You can keep logs internally for audit purposes if needed.

The HR manager, the developer, the finance person, they all get the same AI experience they are used to, but now the data stays inside your controlled environment.

For the employee, the experience is basically identical. They open a browser, they talk to an AI, they get answers. They don’t need to know or care about LiteLLM or OCI API authentication. That complexity is handled at the infrastructure level by whoever set it up.

When On-Demand Is Not Enough: Dedicated AI Clusters

On-demand models are perfect for most situations, but there are companies where the requirements go a bit further. Maybe you need guaranteed throughput because you have hundreds of concurrent users. Maybe you want to fine-tune a base model with your own data to make it more specialized for your domain. Maybe you don’t like the model offered by OCI on-demand. Or maybe your security policy simply does not allow shared infrastructure, even if logically isolated.

For these cases, OCI Generative AI also offers dedicated AI clusters. The concept is straightforward: instead of sharing the underlying GPU infrastructure with other tenants on-demand, Oracle provisions a dedicated cluster exclusively for your tenancy. The model runs on hardware that belongs to your environment, not a shared pool.

This has a few important implications. First, the isolation is not just logical, it is physical. Your inference workloads are not sharing GPUs with anyone else. Second, you get predictable performance. No latency spikes because some other tenant is running a heavy batch job on the same hardware. Third, dedicated clusters support fine-tuning, which means you can take a base model like Llama 3.3 70B or Cohere Command A and train it further on your own datasets on internal documentation, support tickets, product manuals, legal templates, whatever makes sense for your business.

The result is a model that not only stays private but actually knows your company. It has seen your internal language, your processes, your products. That is a significant step up from using a generic public model, and it is something that only a dedicated setup can give you.

The trade-off is cost. Dedicated clusters are more expensive than on-demand, and they come with a minimum commitment. But for companies at a certain scale, or in industries where data sovereignty is a hard requirement, this is often the right choice. You are not renting time on a shared service anymore. You are running your own private AI, managed by Oracle, on infrastructure that is completely yours.

And the good news is that from the architecture perspective, nothing changes. LiteLLM connects to a dedicated cluster endpoint exactly the same way it connects to on-demand models. You change one configuration line, and your users don’t notice anything different.

The Cost Side

I know what you are thinking, this sounds like a lot of infrastructure to manage just to use an LLM. But it is less complicated than it looks.

The VM running LiteLLM can be very small. A 2 OCPU, 8GB RAM shape in OCI is enough. LiteLLM itself is lightweight. The heavy computation happens on Oracle’s side, in the Generative AI service. You don’t need GPUs, you don’t need specialized hardware.

The on-demand models on OCI are priced per million tokens, which is the same pricing model as most public AI services. For a company of 50 people using AI for office tasks, the cost is very reasonable. And you get rid of the risk of accidental data exposure, which in regulated industries can cost you much more than a few months of API calls.

There is also a practical benefit from a management perspective: you have one single control point. Want to switch from Llama 3.3 to Llama 4 Maverick? Change one line in LiteLLM configuration. Want to disable access for a user who left the company? Revoke their LiteLLM API key. Want to add a new model for the development team? Add it to LiteLLM and expose it only to their key.

Why This Matters More Than People Think

I want to end with something a bit more philosophical, because I think the technical details sometimes distract from the real point.

AI is becoming a standard tool in companies, like email or spreadsheets. But unlike email, AI tools are actively reading and processing your most sensitive content. The model on the other side is learning the shape of your business, what you write about, what documents you care about, what language you use internally.

When you use a private deployment or a cloud service with zero data retention like OCI, you keep that intelligence inside your company. The patterns in your data, the knowledge embedded in your documents, the specific language of your business, none of that leaks to a shared training dataset.

For most companies today, this is already a competitive advantage. For regulated industries like healthcare, legal, finance, it is simply a requirement.

The technology to do this properly is available, it is not that expensive, and it is not that complicated to set up if you have someone who knows infrastructure. I hope this post gives you a useful starting point.

If you want to see the technical details of how to configure LiteLLM with OCI and connect it to a chat frontend, you can read my previous post where I covered the exact steps for OpenClaw, the same logic applies to Lobechat and most other OpenAI-compatible clients.

How to Connect Openclaw to Oracle OCI LLM Models

At the time of publishing this post, I am an Oracle employee, but the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

The Real Problem with Public AI Services

What is the Alternative

The Models Available on OCI Today

How the Architecture Works and How to Build it.

A Real Company Scenario

When On-Demand Is Not Enough: Dedicated AI Clusters

The Cost Side

Why This Matters More Than People Think

Please Share This Share this content

You Might Also Like

Establish a VPN with Oracle OCI with a dynamic public IP ADDRESS on CPE

Maximizing Storage I/O Performance for Microsoft SQL Server workloads on Oracle OCI IaaS.

Leave a Reply Cancel reply

Share this content