Implementation

AI Data Security & Compliance in Indonesia (UU PDP)

Raymond ChinFounder, Genesis — Venture House
Published 10 min read

TL;DR

  • Data you send to third-party LLMs can be used for model training unless you explicitly opt out in your contract.
  • Indonesia's UU PDP (Law No. 27/2022) requires consent, purpose limitation, and data subject rights — these apply to AI deployments too.
  • Cloud and on-premise hosting each carry distinct control, cost, and cross-border compliance trade-offs with no universal winner.
  • Run a pre-deployment data checklist to identify PII, classify risk, and audit contract clauses before any vendor goes to production.

AI data security begins with a question that's routinely skipped: where does your business data actually go when you press "send" in a third-party LLM interface? This article answers that operationally — not theoretically — and connects it to obligations that already exist under Indonesian law.

Important note: this article is general guidance to help businesses start the conversation about AI data security and compliance. It is not legal advice. For your specific needs and contractual decisions, consult a qualified legal professional.

If you're evaluating or already running AI solutions from external vendors, start by seeing what's available at /marketplace to understand the landscape of local providers whose regulatory context is closer to Indonesia's.

Where your data goes when you use a third-party LLM

When your team feeds a contract document, customer data, or internal notes into an LLM prompt — whether that's GPT, Gemini, Claude, or another model — the data passes through several layers, each with its own security implications.

First, data is sent over HTTPS to the vendor's servers outside Indonesia. Most major LLM providers operate from data centres in the United States or Europe — meaning there is a cross-border data transfer that, under the UU PDP framework, requires attention because it involves the personal data of Indonesian nationals crossing into another jurisdiction.

Second, data is processed on the vendor's infrastructure. What happens here depends entirely on the service agreement. Without explicit contractual provisions, many commercial LLM providers use API interactions by default to improve their models — meaning your business data could become part of the training dataset for the next version.

Third, outputs are returned and may be retained on the vendor's side for audit purposes, safety monitoring, or service improvement — for periods that vary by policy.

This is not to say third-party LLMs are inherently dangerous. Many offer enterprise tiers with guarantees against training on your data, strong security SLAs, and end-to-end encryption. But the defaults are not always safe, and without reading the data policy and adding the right contractual clauses, you don't know which standard applies to your account.

What UU PDP requires from businesses using AI

Law No. 27/2022 on Personal Data Protection is Indonesia's first comprehensive data protection framework. While its full implementation is still evolving, its core principles are clear and directly relevant to AI deployment.

Consent and purpose limitation. Personal data may only be processed on a valid legal basis — typically consent from the data subject or the performance of a contract. Critically, data may not be used for purposes beyond what was communicated at the time of collection. If you collect customer data for service delivery, you cannot simply feed it into an AI pipeline for a different purpose without appropriate transparency and legal basis.

Data subject rights. UU PDP grants individuals the right to access their data, request corrections, and in certain conditions request deletion. In an AI context, this means your system must be able to answer: where is a person's data stored, who is processing it, and how can it be deleted on request.

Cross-border data transfer. Sending personal data outside Indonesia is only permissible if the destination country provides equivalent data protection, or if adequate contractual guarantees are in place. This directly implicates the use of cloud AI whose servers are abroad.

Controller and processor obligations. UU PDP distinguishes between the data controller (who determines the purpose of processing) and the processor (who processes data on the controller's instructions). Your third-party AI vendor is a processor — and you, as the business, remain responsible as the controller.

Again: this is an interpretation of general principles for operational guidance purposes. For application specific to your industry and situation, consult a legal professional.

Cloud vs on-premise: the real trade-off comparison

There is no universally better choice. What exists is a fit between architectural choice and your business's risk profile and capacity.

DimensionCloud AI (SaaS/API)On-premise / Self-hosted
Data controlLimited — data travels to vendor serversFull — data stays on your infrastructure
Upfront costLow — pay per useHigh — servers, licences, setup
Long-term costCan increase significantly at scaleMore stable, but ongoing maintenance costs
Deployment speedFast — days or weeksSlow — can take months
Cross-border complianceRequires vendor audit and contract clausesNo data transfer outside — simpler
Required team capacityMinimal — vendor manages infrastructureSignificant — needs infra and ML ops team
Best forBusinesses starting fast, moderate volume, non-sensitive or pseudonymised dataBusinesses in heavily regulated sectors (finance, healthcare), very high volume, or data that cannot leave

For most mid-sized Indonesian businesses, cloud AI with tight enterprise-tier settings is a realistic starting point. On-premise makes sense when sector regulations prohibit data from leaving (banking or medical records, for example), or when usage volume is large enough that cloud cost exceeds the infrastructure investment.

There is also a hybrid deployment model that's gaining traction: the model runs in the cloud, but sensitive data is processed locally before it's sent — only embeddings or anonymised representations leave your infrastructure to the vendor. This can be a solid middle ground for businesses that need LLM capability but can't allow raw data to leave Indonesia.

What must be in your AI vendor contract

A weak contract is the most preventable data risk you face. Here are the clauses that must exist before you sign anything with an AI vendor:

1. Prohibition on using your data for model training. State explicitly that data you send — including prompts, documents, and outputs — may not be used to train, fine-tune, or improve the vendor's model without your written permission. This is the clause most commonly absent and most consequential.

2. Data and output ownership. Establish from the start: data you input is yours. Output generated from that data is also yours. The vendor holds no rights to either after the contract relationship ends.

3. Transparent sub-processor list. Large AI vendors typically use sub-processors — cloud providers, storage services, and other partners. You are entitled to know who they are, where they operate, and what security standards they apply.

4. Data retention and deletion policy. How long does the vendor retain your data? What is the process for deleting it after the contract ends? Require a specific deadline, not "data will be deleted in a reasonable timeframe."

5. Breach notification. If a data breach occurs, within how many hours must the vendor notify you? Industry standard is generally 72 hours, aligned with GDPR practice that is widely used as a reference. Without this clause, you might learn from news before you hear from the vendor.

6. Audit and compliance. For vendors handling sensitive data, request evidence of security certification (ISO 27001, SOC 2 Type II) and the right to conduct or receive periodic audit reports.

A 12-point pre-deployment data checklist

Run this checklist before any AI system touches real data:

  1. Data classification — identify all types of data that will enter the system. Which are PII? Which are commercially sensitive?
  2. Data minimisation — remove or pseudonymise all data not specifically needed for the targeted AI function.
  3. Legal basis for processing — document the legal basis for each category of data processed (consent, contract performance, or legitimate interest).
  4. Vendor audit — request and read the vendor's data policy, DPA (Data Processing Agreement), and security certifications.
  5. Contract clauses — confirm the six clauses above are present in the service agreement.
  6. Cross-border transfer — document which countries data flows to and what protection mechanisms apply.
  7. Access and permissions — who on your team can access the AI system and the data inside it? Apply the principle of least privilege.
  8. Encryption — confirm data is encrypted both in transit (HTTPS/TLS) and at rest.
  9. Logging and audit trail — logs recording who accessed what and when are critical for incident investigation.
  10. Incident response plan — what is the first step if a breach occurs? Who is contacted, and within how many hours?
  11. Team training — does the team using the system understand what data can and cannot be entered?
  12. Periodic review — schedule a compliance audit at least annually, or whenever there is a significant change to the system or regulation.

This checklist is not a formality — each point represents a real risk vector that has been the source of data breaches at other organisations.

PII and AI: a practical guide for deployment

Personally Identifiable Information (PII) in an AI context covers more than names and ID numbers. In an LLM or RAG pipeline, PII can surface from unexpected directions.

Watch for potential PII in these places: contract documents (names, signatures, account numbers), customer support tickets (names, complaints, purchase history), internal emails (names, job titles, sensitive business discussions), system logs (IP addresses, user IDs, behavioural patterns), and conversation recordings (voice transcripts).

Recommended approach before data enters an AI pipeline:

  • Automated redaction: use NER (Named Entity Recognition) tools to detect and mask PII before data is sent to an LLM.
  • Pseudonymisation: replace direct identifiers with random tokens that can only be reversed by your internal systems, not by the vendor.
  • Data segregation: keep sensitive data in a separate database that is not directly connected to the AI pipeline; only aggregates or anonymous representations are processed by the model.

For RAG systems specifically — where an LLM accesses an internal knowledge base — ensure that access controls on the knowledge base itself are correctly set. If certain documents should only be accessible to certain user groups, those permissions must be enforced at the retrieval layer, not just at the interface layer. This is a common gap that gets missed in RAG implementations. You can read more about secure RAG architecture in the sibling article RAG Knowledge Base for Enterprises.

Choosing an AI vendor with the right security posture

When evaluating AI vendors — whether for local implementation or cloud solutions — data security must be part of the selection criteria, not an afterthought after the contract is signed.

Questions to put to every candidate vendor:

  • What security certifications do you hold (ISO 27001, SOC 2 Type II)?
  • Where is customer data stored physically?
  • Is my data used to train your model? How do I opt out?
  • Who are your sub-processors?
  • How long is my data retained after the contract ends?
  • What is your breach notification procedure?

A vendor that cannot answer these questions clearly — or responds defensively — signals an immature security posture. To see providers who have been verified and respond to these criteria, explore /marketplace.

A full guide on how to evaluate and select AI vendors in general is available in the sibling article How to Choose an AI Service Provider in Indonesia.

Conclusion

AI data security is not something that can wait for an incident to happen. With UU PDP in force and the AI ecosystem growing rapidly, Indonesian businesses using third-party LLMs need to understand where their data goes, what their legal obligations are, and how to negotiate vendor contracts that provide real protection.

Start with what you can control directly right now: audit what data is entering your AI systems today, check your vendor contracts for missing clauses, and run the pre-deployment checklist before your next expansion.

Concrete next step: find AI vendors verified for transparency on data security at /marketplace. Providers who want to be listed can register at /marketplace/daftar. And if you want to measure how ready your team is to adopt AI responsibly, start with /pari.

More than half of data breach incidents at Southeast Asian organisations involve data shared with third parties — including SaaS vendors and cloud APIs.

IBM Cost of a Data Breach Report (2024)

The average cost of a data breach in Southeast Asia reached approximately USD 2.6 million per incident, up from the prior year.

IBM Cost of a Data Breach Report (2024)

Frequently asked questions

Does Indonesia's UU PDP apply to companies using AI?

Yes. Law No. 27/2022 on Personal Data Protection applies to anyone processing the personal data of Indonesian nationals, including through AI systems. Key principles include consent, purpose limitation, data minimisation, and data subject rights. This is general guidance — consult a legal professional for your specific situation.

Is my data being used to train third-party LLM models?

It depends on the vendor and your account settings. Many commercial LLM providers use API interactions by default to improve their models, unless you enable an opt-out or use an enterprise tier. Read your vendor's data policy and contract clauses before sending any sensitive data.

Which is safer for Indonesian businesses: cloud AI or on-premise?

There's no absolute answer — it depends on your data type, team capacity, and sector regulations. Cloud is faster and cheaper to start, but data crosses to servers outside the jurisdiction. On-premise gives full control but requires infrastructure investment and a team that can manage it.

What are the most important contract clauses when using a third-party AI vendor?

Five critical clauses: (1) prohibition on using your data for model training, (2) ownership of data and model outputs, (3) transparent list of sub-processors, (4) data retention and deletion policy, and (5) breach notification within a specified timeframe. Without these, you have no contractual protection.

How do I identify PII in data that will flow into an AI system?

Start with data classification: names, ID numbers, contact details, location, financial data, and health data are clear PII. Behavioural data and logs that can be combined to identify a person also fall into this category. Audit your data pipeline before deployment, not after.

By

Founder, Genesis — Venture House

Founder of Genesis, a venture house backing and building AI-era companies in Southeast Asia. Writes on how businesses actually adopt AI — past the hype, into operations.

Read inID

Related articles

ImplementationLlm Rag

Custom LLM & RAG: Giving AI Access to Your Company Knowledge

What RAG actually is, how it differs from fine-tuning, and how to build an internal AI assistant that answers from your own documents without hallucinating.

Jun 3, 202610 min read
ToolsGenerative Ai

Generative AI for Business: Practical Uses Beyond the Hype

Plain explainer on generative AI for business owners — what it actually does well, where it fails, how to pick tools, and governance basics every team needs.

Jun 13, 202610 min read
ImplementationVoice Ai

Voice AI & Call Center Automation in Indonesia

Voice AI in Indonesia: how TTS, STT, voice bots, and IVR work, where they save real money, and where they still frustrate callers.

Jun 6, 202610 min read