Computer vision is the branch of AI that turns images and video into structured data — identifying defects on a production line, reading text on a scanned invoice, counting parcels on a conveyor belt, or verifying that a retail shelf is stocked correctly. The technology is not new, but the cost of deployment has dropped sharply over the last three years, putting it within reach of mid-size Indonesian manufacturers and logistics operators — not just multinationals.
This article cuts through the pitch decks and covers what CV actually does in Indonesian industrial settings, what it costs to build and run, what accuracy you can honestly expect, and — critically — when it is not the right tool for the job. If you are evaluating providers, you can compare verified computer vision vendors at /marketplace.
Manufacturing: defect detection and quality control
The most commercially mature CV use case in Indonesian industry is visual quality control on production lines. Instead of a human inspector eyeballing every unit, a camera captures images at speed and a model classifies each item as pass, fail, or uncertain — routing fails and uncertain cases to a human re-check station.
Common targets: surface scratches on electronics and automotive parts, colour deviations in garments and packaging, dimensional checks for precision components, foreign-object detection in food processing. Each requires a model trained on images of your specific product and your specific defect types — a textile factory's defect taxonomy looks nothing like a circuit-board inspection system.
Realistic expectations for a well-scoped QC project:
| Parameter | Typical range |
|---|---|
| Accuracy (pass/fail classification) | 90–97% on held-out test set |
| False-positive rate | 2–8% (good units rejected) |
| False-negative rate | 1–5% (defective units passed) |
| Labelled images needed per defect class | 500–5,000 depending on visual complexity |
| Camera + edge hardware setup | Rp 30–80 juta per inspection station |
| Model training + integration (one-time) | Rp 80–250 juta depending on scope |
The 90–97% figure sounds high, but in a line producing 10,000 units per shift, a 3% error rate means 300 misclassifications per shift — which is why human review of uncertain predictions is not optional. Design the system with three buckets — pass, fail, review — not two.
Lighting and camera positioning are non-negotiable. The most common reason CV projects underperform in the field is not the model — it is inconsistent ambient lighting or camera vibration from the machinery. Budget for industrial LED ring lights and vibration damping before budgeting for GPU compute.
Logistics: parcel sorting, inventory counting, and label reading
Indonesian logistics players — from third-party logistics operators to e-commerce fulfilment centres — face two recurring problems: reading handwritten or poorly printed shipping labels at speed, and verifying that the right items were picked for an order.
Parcel OCR and label reading. Modern OCR models (PaddleOCR, Tesseract 5, cloud APIs from Google Vision or AWS Textract) handle printed text with high reliability when image quality is controlled. The hard part is semi-structured and handwritten text — Indonesian addresses with non-standard abbreviations, mixed Bahasa and regional-language content, and handwriting variation. Fine-tuning on Indonesia-specific address formats usually requires 2,000–10,000 labelled examples to push accuracy meaningfully above the out-of-the-box baseline.
Inventory counting with object detection. Counting items on shelves or pallets using object detection (YOLO-family models are standard) works well for uniform products in controlled environments. Stacked, overlapping, or visually similar items remain challenging. Hybrid approaches — computer vision for gross counts, barcode scanning for per-unit verification — outperform pure CV in most warehouse settings.
Condition and damage detection. Checking whether a delivered parcel is visibly damaged at intake or dispatch — dented, wet, torn — is a genuine CV use case gaining traction at larger Indonesian fulfilment centres.
Retail: shelf monitoring and out-of-stock detection
Retail shelf monitoring uses a camera (often mounted on a store trolley or fixed above the aisle) to detect empty shelf spaces, misplaced products, or planogram compliance violations. The commercial pitch is reducing out-of-stocks without relying on manual store audits.
The economics work best in high-SKU convenience formats and supermarkets where shelf turnover is fast and out-of-stock events are frequent. A mid-size Indonesian minimarket chain running 200+ stores can realistically justify a shelf-monitoring pilot — a single-location operation cannot.
Key dependency: you need a clean product image catalogue per SKU. If your product database does not have consistent, high-resolution reference images, the model cannot reliably distinguish between brands or variants. This is often the real project blocker, not the CV model itself. Explore how vendors on /marketplace handle this in their implementation process.
Document processing: invoice OCR and KTP verification
OCR for structured documents — invoices, purchase orders, delivery receipts — is one of the highest-ROI CV applications for Indonesian businesses today. The task is narrowly defined (extract specific fields from a predictable document layout), pre-trained models are widely available, and the downstream automation (routing to accounting, updating ERP) is straightforward.
Indonesian-language document OCR has two specific challenges:
- KTP and BPJS cards have varying print quality across issuance batches; older formats differ from newer Dukcapil templates.
- Commercial invoices from SME suppliers often use non-standard layouts — the AI must handle semi-structured extraction, not just fixed-field parsing.
For structured invoice templates, out-of-the-box cloud APIs (Google Document AI, AWS Textract, Azure Form Recognizer) often achieve 94–98% field-level accuracy with minimal setup cost. For KTP verification in onboarding flows (e-KYC), several Indonesian providers offer pre-built integrations with the Dukcapil API that combine OCR with liveness detection — this is a mature, low-customisation use case.
Attendance and access control: face recognition
Face recognition for factory and office attendance is the most widely deployed CV system in Indonesian workplaces today. Hardware bundles (face recognition terminal + attendance software) have dropped to commodity pricing, and the technology is well understood.
Implementation notes that matter in an Indonesian context:
- Accuracy with head coverings. Models trained primarily on uncovered-face datasets have lower accuracy on users wearing hijab, safety helmets, or dust masks — relevant in manufacturing. Test explicitly on your workforce before committing to a vendor.
- Data privacy. Biometric data is classified as sensitive personal data under Indonesia's Personal Data Protection Law (UU PDP, 2022). Collection requires explicit consent, a clear retention policy, and appropriate security controls. Any vendor that does not raise this proactively is a concern.
- Liveness detection. Anti-spoofing (rejecting a printed photo or video replay) is a baseline requirement for access control. Confirm the vendor's liveness detection method before purchase.
Data requirements and labelling realities
Every custom CV model requires labelled training data — images annotated with bounding boxes, segmentation masks, or class labels depending on the task. This is where most project timelines slip.
| Task type | Minimum viable dataset | Realistic labelling effort |
|---|---|---|
| Binary defect detection (pass/fail) | 500–2,000 images per class | 2–6 weeks with a small labelling team |
| Multi-class defect classification | 1,000–5,000 per defect type | 4–12 weeks |
| Object detection (counting/locating) | 500–3,000 annotated images | 2–8 weeks |
| Document OCR (structured layout) | Pre-trained; 200–1,000 fine-tuning examples | 1–3 weeks |
| Face recognition | Pre-trained; enrolment photos per person | Days |
Data quality beats data quantity. A thousand clean, consistently lit, correctly labelled images outperform ten thousand rushed ones. Budget time and internal resource for this — the labelling phase is usually invisible in vendor proposals and is the number-one source of deadline overruns.
Edge vs cloud: making the right call
Edge deployment (running the model on an on-site GPU, a Jetson device, or an AI-enabled industrial camera) is appropriate when:
- The application requires sub-100ms inference (real-time QC, access control)
- Continuous video streaming makes cloud data-transfer costs prohibitive
- Network connectivity at the site is unreliable
Cloud deployment is better when:
- Inference volume is variable (document OCR batches, periodic retail audits)
- You want to avoid capital expenditure on inference hardware
- The use case is not latency-sensitive
Hybrid architectures — lightweight edge model for immediate pass/fail, cloud model for borderline cases — are increasingly common in Indonesian manufacturing deployments and give a reasonable balance between latency and cost.
When computer vision is overkill
Not every visual problem needs a model. Consider whether a simpler solution already solves it:
- A barcode scanner reads product IDs faster and more reliably than object detection in a picking operation.
- A weight sensor catches underfilling in packaging without a camera.
- A template-matching rule flags document format errors without a trained neural network.
- If volume is below a few hundred inspections per day and accuracy requirements are modest, a part-time human checker may have lower total cost than a CV implementation project.
The right question is not "can computer vision do this?" — it almost certainly can — but "does computer vision produce better outcomes per rupiah spent than the alternatives?" The answer is often yes for high-volume, repetitive, visual tasks and often no for everything else.
Conclusion
Computer vision has moved from research curiosity to operational reality in Indonesian manufacturing, logistics, retail, and document processing. The technology works — with honest accuracy expectations, proper data preparation, and infrastructure that fits the use case. The projects that fail usually fail not because the AI is wrong but because the implementation was underprepared: poor lighting, insufficient labelled data, no fallback queue, or a vendor that promised the moon.
If you are ready to scope a CV project or compare providers, start at /marketplace — filter by Computer Vision to see verified Indonesian vendors. Before briefing any vendor, consider completing a /pari assessment to clarify where your organisation sits on the AI-readiness curve — it will sharpen your brief and reduce the risk of a mismatched scope.
For related reading: see our guide to choosing an AI service provider in Indonesia and our AI analytics dashboard implementation guide.