The 60-Second Version
Large Language Models like GPT-4 and Claude are powerful, but they don't run themselves. Somewhere between "get an API key" and "build your own GPU cluster" lies a spectrum of deployment options, each with real tradeoffs in privacy, capability, cost, and complexity.
For Australian organisations handling sensitive data, this decision matters more than most teams realise. Data sovereignty, the US CLOUD Act, industry regulation, and commercial pragmatism all shape which deployment model actually fits. Yet most teams default to the easiest option and move on without asking whether it's the right one.
Every LLM deployment falls into one of four categories. Understanding what each gives you (and what it costs you) is the first step toward making the right call.
How They Compare
Before diving into each option, here's the high-level picture. We score each category across five dimensions that matter most in enterprise decisions: privacy, accuracy, speed, cost, and setup complexity.
| Category | Privacy | Accuracy | Speed | Cost | Setup |
|---|---|---|---|---|---|
| Public API | Low | Best in class | Fast | Scales with usage | Trivial |
| Managed Private | High | Best in class | Fast | Per-token + infra | Low to moderate |
| AU Sovereign | Maximum | Strong | Fast | Moderate | Low |
| Self-Hosted | Maximum | Varies by model | Varies by hardware | High upfront, zero ongoing | Significant |
There is no single best option. The right choice depends on your data sensitivity, your existing cloud infrastructure, your budget, and whether you need frontier model quality or can work with capable open-source alternatives. Most production deployments end up using more than one category for different workloads.
Public API
This is the simplest path to LLM capability. Generate an API key, install the SDK, and you're running in under an hour. You get immediate access to the most capable models available: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro. No infrastructure to manage, no hardware to provision, no deployment pipeline to build.
The tradeoff is total. All your data leaves your environment. Every prompt, every document you send for analysis, every piece of context travels to vendor servers. The major providers offer zero-retention policies and contractual commitments not to train on your data, but the data still transits through their infrastructure and is processed on their hardware.
For development work, prototyping, and non-sensitive use cases, this is the obvious starting point. The model quality is the best available, the documentation is mature, and you can be productive in minutes.
For production workloads involving client data, intellectual property, or regulated information? That's where you need to look further down the spectrum.
Pay-per-token pricing is easy to start with but hard to predict at scale. What costs $50 per month during prototyping can become $5,000 per month in production. Factor in your projected volume before committing to this model long term.
Managed Private API
Think of this as the best of both worlds: frontier model quality deployed inside your own cloud environment. The major cloud providers offer this through services that run the same models you'd access via a public API, but within your VNet or VPC.
Your data stays in your chosen region. Neither the cloud provider nor the model vendor uses your data for training. The infrastructure is HIPAA and ISO 27001 eligible. If your organisation (or your client) is already running workloads on a major cloud platform, this is often the path of least resistance.
Setup typically takes one to two days of infrastructure configuration: VNet setup, IAM policies, model deployment, and endpoint testing. Once it's running, the experience is nearly identical to the public API, with minimal latency overhead.
The major cloud platforms are US-owned companies. Under the US CLOUD Act, US authorities can compel access to data held by US companies regardless of where that data is physically stored. If your client's data is already hosted on one of these platforms, this risk is already accepted and a managed private API doesn't meaningfully increase it. But if CLOUD Act exposure is explicitly unacceptable, you'll need to look at the next two options.
Australian Sovereign API
This is the option that didn't exist two years ago, and it changes the calculus significantly for Australian organisations.
A growing number of Australian providers now offer LLM inference on Australian-owned infrastructure, under Australian law, operated by Australian staff. No US parent company. No CLOUD Act exposure. Your data stays in Australia under Australian jurisdiction, full stop.
These platforms typically provide OpenAI-compatible APIs, which means minimal code changes for teams already building on the public API. In many cases, it's as simple as swapping the base URL and API key. Your existing integration, prompts, and tooling carry over.
The tradeoff is model capability. Sovereign providers run strong open-source foundation models rather than the proprietary frontier models you'd get from the major vendors. For many structured enterprise tasks (extraction, classification, summarisation, document analysis), the quality difference is small and shrinking rapidly. For tasks requiring the very best complex reasoning available today, the gap is real but narrowing with each model generation.
Pricing is typically per-token in AUD, which eliminates foreign exchange risk and simplifies procurement for Australian enterprise clients.
For government, regulated industries (APRA, IRAP), and any organisation where CLOUD Act exposure is explicitly unacceptable, the Australian sovereign option is the strongest managed deployment available. It's also the simplest migration path from a public API prototype.
Self-Hosted Open Source
This is the other end of the spectrum: maximum privacy, maximum control, maximum effort. Self-hosted models run entirely on your own hardware. Fully air-gapped. No API calls. No internet connection required. Your data never leaves the machine.
The open-source model ecosystem has matured dramatically. Models ranging from lightweight (a few billion parameters, suitable for narrow tasks) through to very large (tens of billions of parameters, approaching frontier quality) are freely available. The right model size depends on your use case, your hardware budget, and how much operational complexity your team can absorb.
At the smaller end, local deployment tools have made getting started remarkably simple. A single command can download and run a model on a consumer GPU. But there's a significant gap between "it runs on my machine" and "it's production-ready." As you move toward larger, more capable models, the complexity grows substantially: GPU procurement, memory management, model serving infrastructure, performance tuning, and ongoing operational maintenance.
Self-hosting makes the most sense in three scenarios: when you need a fully air-gapped environment with no external connectivity, when you're processing enough volume that per-token API costs become prohibitive, or when you need to fine-tune a model on proprietary data. Outside of these, the operational overhead is hard to justify compared to managed alternatives.
The cost model is the inverse of everything else on this list. High upfront hardware investment, zero per-token cost after that. Whether this works out cheaper depends entirely on your usage volume and time horizon. At high sustained volume, self-hosted can deliver significant savings. At low or unpredictable volume, you're paying for expensive hardware that sits underutilised.
What Most Teams Get Wrong
After working across multiple LLM deployments in Australian enterprise environments, these are the patterns we see most often:
The public API is the fastest way to get started, and that makes it the default. But "fastest to prototype" and "right for production" are different questions. We regularly see teams build entire workflows on a public API, only to discover during compliance review that the data can't leave the country. Choosing the right deployment model early saves a painful migration later.
The CLOUD Act conversation isn't simply "US cloud is bad." If your client is already running production workloads on a US-owned cloud platform, the jurisdictional risk is already present in the environment. Adding a managed private LLM endpoint within that same environment doesn't meaningfully increase the exposure. The sovereign option matters most when the client is not already on US cloud, or when they have explicit contractual or regulatory requirements to avoid it.
We see teams invest months building GPU infrastructure and model serving pipelines for workloads that would be perfectly served by a sovereign or managed API. Self-hosted is the right call for specific requirements (air-gapping, fine-tuning, extreme volume), not as a default "we want control" decision. Start with a managed option and only move to self-hosted when you have a concrete reason the managed option can't meet.
Most mature deployments don't use a single option. They use public APIs for development, a managed or sovereign service for production workloads with client data, and sometimes a small self-hosted model for high-volume, low-complexity tasks like classification or extraction. The deployment decision isn't "pick one," it's "pick the right one for each workload."
How to Choose
Rather than comparing every option against every other, start with two questions.
Question 1: How sensitive is the data?
If you're working with non-sensitive data, sample data, or internal tools with no client information, the public API gives you the best models with the least friction. Use it freely for development and prototyping.
If you're handling client data, regulated information, or intellectual property, you need to move right on the spectrum. How far right depends on the next question.
Question 2: What infrastructure does the organisation already use?
If the organisation is already on a major US cloud platform, the managed private option is the path of least resistance. The CLOUD Act risk is already accepted within the environment. You get frontier model quality with strong privacy controls and minimal setup.
If the organisation is not on US cloud, or has explicit requirements to avoid US jurisdiction for AI workloads, the Australian sovereign option gives you managed convenience without the jurisdictional exposure. It's often the strongest choice for government, financial services, and healthcare clients.
Self-hosted is the right answer when you have a specific, concrete requirement that no managed option can satisfy: air-gapping, fine-tuning on proprietary data, or extreme processing volume where per-token costs dominate.
Where to Start
If you're evaluating LLM deployment for your organisation, here's a practical sequence that works well.
Prototype on the public API. It's the fastest way to validate whether an LLM approach solves your problem. Use sample or synthetic data. Build your prompts, test your workflows, measure quality. Don't over-invest in infrastructure until you've proven the approach works.
Choose your production deployment based on data sensitivity and existing infrastructure. Once you've validated the approach, migrate to the deployment model that matches your regulatory and privacy requirements. This is where the real architectural decisions happen: which model, which provider, how to handle authentication, monitoring, cost management, and failover.
Plan for a multi-model future. The organisations getting the most value from LLMs aren't locked into a single deployment model. They use different options for different workloads, optimising for the right balance of capability, privacy, and cost in each case.
The deployment decision isn't permanent, and it isn't one-size-fits-all. The technology landscape is shifting quickly. Australian sovereign options in particular are maturing fast and closing the gap with frontier models. The right strategy starts with understanding your requirements and choosing the simplest option that meets them, with a clear path to evolve as your needs change.