English {#english}

RAG AI Edge Architecture — Full Edge-Computing & Cloudflare Partner Capabilities

Document version: 1.1
Purpose: Define architecture, feature modules, and phased roadmap for Full Edge-Computing RAG AI HR capabilities, assuming the same Cloudflare-friendly stack as Zuplo and Cloudflare. No code generation — planning and design scope only.

References: saas-unified-architecture-hetzner-cloudflare-zuplo-plan.md, pregoi-infrastructure-r2-api-update-plan.md. Worker spec: worker-ai-implementation-plan.md — Ingestion, Query, Compliance, PII, API, schema, edge constraints.

1. Architecture Principles: Full Edge-Computing

Prego RAG AI adopts a Full Edge-Computing model: no dependency on external regional servers; like Zuplo, auth, quota, vector search, LLM inference, and audit run on the Cloudflare global edge.

Principle	Description
Edge-first	API gateway, business logic, AI inference, and vector search all run inside the Cloudflare network.
Cloudflare-native	Workers, Workers AI, Vectorize, D1, R2, Zuplo (gateway) as first-class. Minimize external LLM API dependency (Phase 1 goal: remove).
Multi-tenancy	Tenant Key Management, Usage Quota, and namespace isolation enforced at the edge.
Compliance & security	Data retention, deletion, and audit design aligned to Singapore PDPA, SOC2, etc.

2. Core Component Design

2.1 ① Edge Orchestration (Zuplo + Workers)

Zuplo: API Gateway

Role	Detail
Tenant auth	Key Management — issue and validate API keys per tenant/consumer. Extend existing Prego–Zuplo integration (Frappe API Key registration) pattern.
AI usage limits	Usage Quota enforced at the edge. Daily/monthly AI call limits per tenant/plan; 429 or soft warning on excess. Zuplo Policy uses request count and KV or D1 aggregates.
Routing	Route RAG/AI endpoints (`/ai/query`, `/ai/embed`, `/ai/guardrail`) to Worker backends.

Workers: Business logic and AI pipeline

Role	Detail
HR workflow	Integrate with existing HR APIs (leave, attendance, policy). Fetch context (user role, department, permissions) from D1 for RAG responses.
AI prompt chaining	User query → (optional) Guardrail check → vector search for context → assemble prompt for LLM → (optional) Guardrail re-check → audit log. Multiple steps in one Worker or sub-requests.
Model calls	Workers AI (`@cloudflare/workers-ai`) for Llama 3, Mistral, etc. No external LLM API (Phase 1 goal).

Data flow summary

[Client] → Zuplo (auth, Quota) → Worker (RAG/Guardrail/Prompt Chaining) → Workers AI / Vectorize / D1 / R2

2.2 ② Hybrid Vector Search (Vectorize multi-tenancy)

Namespace isolation

Item	Design
Single index, multiple namespaces	One Vectorize index with per-tenant namespace. Queries always scoped to that tenant namespace.
Security	Logical isolation by namespace without separate physical index per tenant. No cross-tenant vector/metadata access.
Operations	Shared index schema (dimensions, metadata); namespace = tenant_id (or tenant_id + document_set_id).

Global sync & latency

Item	Design
Global network	Vectorize keeps index synced to the edge. Same index used from Singapore, US, Europe, etc.
Target	RAG response (query → vector search → LLM → response) P95 < 50ms. Edge proximity and Workers AI low latency.
Embedding	At ingestion, Workers AI Embedding (or same edge model) → vectorize → upsert to Vectorize with namespace.

2.3 ③ Trusted Data (D1 & R2)

D1: Tenant settings and permissions

Use	Content
Tenant settings	RAG on/off, allowed document sets, Guardrail on/off, languages.
Role profile	User role, department, permissions. Filter “which documents can this user access?” for RAG.
Existing	Integrate with tenants_master, tenant_runtime. Extend with RAG tables (tenant_ai_settings, role_profiles, etc.).

R2: RAG source documents and lifecycle

Use	Content
Source storage	Uploaded policies, manuals (PDF/Office). Path e.g. `rag-docs/{tenant_id}/{document_set_id}/{object_key}`.
Lifecycle rules	R2 bucket Lifecycle for auto deletion. e.g. delete or archive after N days. PDPA retention compliance.
Access	Workers only. No direct client access. Tenant prefix isolation.

PDPA / compliance

Design includes procedures: on retention expiry or consent withdrawal, delete R2 objects, clear Vectorize namespace, delete D1 metadata for that tenant/document set. Align with existing “delete after transfer” and “Fail Sanely” policies.

3. Zuplo-Based Usage Limits and Metering & Billing

From a Global Principal Architect perspective: use Zuplo to measure Cloudflare RAG AI usage in real time and link to Stripe billing.

3.1 Principle: Edge-based usage tracking

Compute and record token usage at the gateway so every request is counted, without extra latency. No separate metering service; Zuplo Inbound/Outbound policies handle it.

3.2 Flow

Step	Owner	Content
1. Auth and quota (Pre-proc)	Zuplo	On request, get `tenant_id` from API Key/JWT. Look up remaining quota in D1 (or KV). Return 429 if over to avoid AI cost.
2. AI request/response	Workers AI	RAG answer. Response header or body includes token count from Workers AI (e.g. `x-ai-usage-tokens` or metadata).
3. Record usage (Post-proc)	Zuplo Custom Policy	Outbound policy intercepts response, extracts tokens. Async update D1 usage table and Stripe Usage-based Billing (`subscription_items/.../usage_records`). Use `waitUntil` (or Zuplo equivalent) so recording does not delay response.

3.3 Zuplo Custom Policy (implementation)

Place: Zuplo Inbound for tenant ID and quota check; Outbound for post-response usage recording. Or single Outbound: next() → get response → extract tokens → async record.
Input: request.user.data.tenantId (or Consumer metadata).
Record: tenant_id, token count, timestamp. D1: daily/monthly aggregate or new row; Stripe: quantity=tokens, timestamp, action=increment.
Secrets: D1 binding/URL, Stripe Secret Key, Stripe Subscription Item ID (per tenant/plan).
Code: This doc is planning only; implement using Zuplo Policy API, next(), waitUntil per official docs.

3.4 Billing scenarios and pricing

Tier	Billing model	Zuplo enforcement
Free / Basic	Hard quota (free allowance)	Zuplo returns 429 when over; block AI request.
Premium	Tiered pricing	Included usage up to limit; overage at $X per 1k tokens. Stripe Metered Billing via usage_records.
Enterprise	Volume discount	Lower unit at scale. Optional dedicated gateway or separate quota and Stripe Product/Price mapping.

3.5 Principal Architect notes (metering & billing)

Note	Content
Idempotency	Use `idempotency_key` when reporting to Stripe to avoid duplicate submissions (e.g. `tenant_id + request_id + timestamp` or Zuplo request ID).
Soft limit alert	At 80% / 90% usage, Zuplo Policy or separate Worker sends email/Slack. Advance warning to avoid surprise bills.
Caching	Use Zuplo Cache for same/similar questions to skip AI compute. Lower customer cost and improve Prego margin. Cache key includes tenant_id and question hash.

4. Vectorize & D1 Provisioning (Pulumi)

For Automated Provisioning (Cloudflare partnership): create Vectorize index, D1 database, and service token for Workers in the customer’s Cloudflare account in one go.

4.1 Goals and scope

Executor: Pulumi (TypeScript or Python). Input: customer Cloudflare Account ID; create resources in that account.
Multi-tenancy: Single index with namespace isolation, or (option) per-tenant index policy.
Outputs: D1 database_id, Vectorize index name, API Token (Secret) — for Zuplo or control module.

4.2 Resources to provision

Resource	Design
D1	Name e.g. `prego_hr_master_db`. Tenant metadata, AI settings, Role Profile. Integrate with existing Prego D1 schema or separate per customer.
Vectorize	Name e.g. `prego-hr-knowledge-base`. Preset: Workers AI–optimized (e.g. `cf-baai-bge-small-en-v1.5`). Dimensions from preset (e.g. 768). Use Namespace for Statutory/Company etc. in one index.
API token (service account)	For Workers to access D1, R2, Vectorize. Scopes: Workers R2 Storage Write, Workers Scripts Write, D1 Write (or minimal for those resources). Token value as Secret only (Pulumi `pulumi.secret` or equivalent).

4.3 Namespace isolation

Recommended: One Vectorize index with per-tenant namespace. Always pass tenant_id (namespace) on query/upsert for logical isolation without separate indexes.
Alternative: If enterprise demands a fully separate index per tenant, design option for Pulumi to create per-tenant Vectorize index; document cost/ops tradeoff.

4.4 Implementation and partner strategy

Item	Content
Zero-touch deployment	Customer enters Account ID (and if needed API Token) in Prego settings; Pulumi runs and within ~1 minute AI infra (D1, Vectorize, token) is set up in their account. Core to strong partner integration.
Privacy isolation	Vectorize by namespace or per-tenant index; D1 by tenant DB or row-level isolation. Ensures customer data is physically and logically separated.
Cost transparency	Resources live in customer’s Cloudflare account; Vectorize/D1/R2 appear on customer’s Cloudflare bill. Prego charges SaaS fee only.

4.5 Zuplo and Worker integration

Store provisioning outputs (D1 database_id, Vectorize index name) in Zuplo env or Control Plane D1. Worker reads them per request.
API Token injected at deploy via Wrangler Secret or Cloudflare env. Zuplo handles auth; Worker accesses D1/Vectorize directly.

5. RAG AI Enterprise Feature Modules

Package differentiated AI HR capabilities as a Cloudflare partner.

Module	Detail (enterprise grade)	Cloudflare tech
Statutory Guardrail	AI checks real-time for labour-law violations by country; flags user/admin or AI output.	Workers AI (Llama 3 / Mistral): classify and explain using regulation text + user input.
Document Intelligence	Answer from large policy docs immediately. Natural-language question → relevant chunks → summary/answer.	Vectorize (namespace=tenant): embed and similarity search. Workers AI: answer from context + query.
Privacy Masking	Before AI: de-identify NRIC/FIN etc. at the edge. Only masked text to AI and vector store.	Workers: regex/pattern for NRIC/FIN → mask (e.g. `***1234`) before Workers AI/Vectorize.
Audit Logging	All AI Q&A and Guardrail results in immutable logs. Audit and compliance.	R2 (log prefix or bucket): JSON logs. Optional Logpush for long-term retention/analytics.

All modules run on edge orchestration: Zuplo auth and Quota → Worker pipeline: Guardrail / Document Intelligence / Privacy Masking / Audit.

6. Phased Roadmap (Phase 1–4)

6.1 Phase 1: Build (Wrangler & dev)

Goal	Content
Deploy standard	All RAG/AI code as modular architecture deployable with Wrangler. Workers, Workers AI, Vectorize, D1, R2 in Wrangler config.
Remove LLM dependency	Use @cloudflare/workers-ai only; no external LLM API. Llama 3, Mistral, etc.
Outputs	RAG Worker(s), Vectorize index and namespace policy, D1 schema extension, R2 bucket and Lifecycle draft, local/staging Wrangler dev. Details: rag-ai-phase1-implementation-plan.md.

6.2 Phase 2: Security & compliance

Goal	Content
PDPA, SOC2	Verify data deletion, access control, audit logic for Singapore PDPA and SOC2. Document and verify R2 Lifecycle, D1 delete, Vectorize namespace cleanup.
Architecture review	Review with Cloudflare security. Align edge data handling, key management, log retention.
Outputs	Compliance checklist, deletion/audit runbook, (optional) Cloudflare security review summary.

6.3 Phase 3: Zuplo integration (Gateway-as-a-Service)

Goal	Content
Customer control	Connector so customers control Prego API via Zuplo from their Cloudflare account.
Features	Per-tenant routing, rate limit, API key, quota in Zuplo dashboard or Config as Code. Prego RAG AI endpoints behind Zuplo.
Outputs	Zuplo connector/template, customer onboarding guide, Prego API spec (OAS).

6.4 Phase 4: Marketplace launch

Goal	Content
Cloudflare Apps	Launch “Prego HR AI” on Cloudflare Apps. Install and configure as app.
Co-selling	Co-selling with Cloudflare Enterprise. Partner program, sales materials, support process.
Outputs	Marketplace listing, pricing/packages, Co-selling playbook.

7. Integration with Existing Prego

Existing	RAG AI integration
Zuplo	Add AI Usage Quota to existing tenant Key Management and Rate Limit. Same Zuplo instance for `/ai/*` routing and Quota.
D1	Integrate with tenants_master, tenant_runtime. Add RAG tables (tenant_ai_settings, role_profiles, ai_audit_log_meta); manage via migrations.
R2	RAG-only bucket (or prefix) and Lifecycle Rules, separate from existing static assets and Usage Raw.
client-web	RAG query UI, Guardrail warnings, Document Intelligence results via Zuplo API from client-web or separate app.
PDPA, NRIC	Align with existing onboarding and masking. RAG path: Privacy Masking fixed at “before AI input”.

8. Non-functional Requirements

Item	Target
Latency	RAG response (edge-to-edge) P95 < 50ms.
Isolation	Per-tenant Vectorize namespace, R2 prefix, D1 row-level.
Compliance	PDPA retention, auto deletion (R2 Lifecycle), audit logs (R2/Logpush).
Scale	Modular Wrangler deployment for adding Workers and resources by feature.

9. References

Document	Use
saas-unified-architecture-hetzner-cloudflare-zuplo-plan.md	Unified architecture, Zuplo, PDPA, onboarding.
pregoi-infrastructure-r2-api-update-plan.md	R2, API Gateway, env vars.
provision-tenant-workflow-design.md	Tenant provisioning, Zuplo registration.
rag-ai-phase1-implementation-plan.md	Phase 1 order, Data Ingestion, Query Worker, Zuplo metering Policy, verification checklist.

This document is planning only; implementation is done per phase on request.

한국어 {#korean}

RAG AI 엣지 아키텍처 기획서 — Full Edge-Computing & Cloudflare 파트너 기능

문서 버전: 1.1
작성 목적: Zuplo·Cloudflare와 동일한 Cloudflare 친화적 스택을 전제로, Full Edge-Computing 기반 RAG AI HR 기능의 아키텍처·기능 모듈·단계별 로드맵을 정의. 코드 생성 없음 — 기획·설계 범위만 포함.

참조: saas-unified-architecture-hetzner-cloudflare-zuplo-plan.md, pregoi-infrastructure-r2-api-update-plan.md. Worker 상세 명세: worker-ai-implementation-plan.md — Ingestion·Query·Compliance·PII·API·스키마·에지 제약.

1. 아키텍처 원칙: Full Edge-Computing

Prego RAG AI는 외부 리전 전용 서버에 의존하지 않고, Zuplo와 같이 Cloudflare 글로벌 엣지에서 인증·쿼터·벡터 검색·LLM 추론·감사까지 처리하는 Full Edge-Computing 모델을 채택한다.

원칙	내용
엣지 우선	API 게이트웨이·비즈니스 로직·AI 추론·벡터 검색을 모두 Cloudflare 네트워크 내에서 실행.
Cloudflare 네이티브	Workers, Workers AI, Vectorize, D1, R2, Zuplo(게이트웨이)를 일급 시민으로 사용. 외부 LLM API 의존 최소화(Phase 1에서 제거 목표).
멀티테넌시	테넌트별 Key Management·Usage Quota·Namespace 격리를 엣지에서 보장.
규제·보안	싱가포르 PDPA·SOC2 등에 맞춘 데이터 보관·파기·감사 설계.

2. 핵심 구성요소 설계

2.1 ① 엣지 오케스트레이션 (Zuplo + Workers)

Zuplo: API 게이트웨이

역할	상세
테넌트 인증	Key Management — 테넌트/Consumer별 API Key 발급·검증. 기존 Prego Zuplo 연동(Frappe API Key 등록)과 동일 패턴 확장.
AI 사용량 제한	Usage Quota를 엣지에서 적용. 테넌트·플랜별 일/월 AI 호출 상한, 초과 시 429 또는 Soft Warning. Zuplo Policy에서 요청 카운트·KV 또는 D1 집계값 참조.
라우팅	`/ai/query`, `/ai/embed`, `/ai/guardrail` 등 RAG·AI 엔드포인트를 Workers 백엔드로 전달.

Workers: 비즈니스 로직 및 AI 파이프라인

역할	상세
HR Workflow	휴가·근태·규정 조회 등 기존 HR API와 연동. RAG 응답에 필요한 컨텍스트(사용자 역할·부서·권한)를 D1에서 조회.
AI 프롬프트 체이닝	Prompt Chaining — 사용자 질의 → (선택) Guardrail 검사 → 벡터 검색으로 컨텍스트 확보 → LLM에 프롬프트 조립 → 응답 후 (선택) Guardrail 재검사 → 감사 로그 기록. 여러 단계를 하나의 Worker 또는 서브요청으로 순차 수행.
모델 호출	Workers AI(`@cloudflare/workers-ai`)로 Llama 3, Mistral 등 호출. 외부 LLM API 비의존(Phase 1 목표).

데이터 흐름 요약

[Client] → Zuplo(인증·Quota) → Worker(RAG/Guardrail/Prompt Chaining) → Workers AI / Vectorize / D1 / R2

2.2 ② 하이브리드 벡터 검색 (Vectorize Multi-tenancy)

Namespace Isolation

항목	설계
단일 인덱스·다중 네임스페이스	하나의 Vectorize 인덱스 내에서 고객(테넌트)별 Namespace를 할당. 쿼리 시 반드시 해당 테넌트 namespace만 지정.
보안	물리적으로 인덱스를 테넌트마다 나누지 않아도, namespace 단위로 논리 격리. 다른 테넌트 벡터/메타데이터 접근 불가.
운영	인덱스 스키마(차원·메타데이터)는 공통, namespace = tenant_id(또는 tenant_id + document_set_id 등)로 구분.

Global Sync & 응답 속도

항목	설계
글로벌 네트워크	Cloudflare Vectorize가 엣지에 동기화된 인덱스를 유지. 싱가포르·미국·유럽 등 어느 지사에서 요청해도 동일 인덱스에서 검색.
목표 지표	RAG 응답(쿼리 → 벡터 검색 → LLM → 응답) 50ms 미만의 P95 목표. 엣지 근접성·Workers AI 저지연으로 달성 목표.
임베딩	문서 수집 시 Workers AI Embedding 또는 동일 엣지 임베딩 모델로 벡터화 후 Vectorize에 upsert(namespace 지정).

2.3 ③ 고신뢰 데이터 처리 (D1 & R2)

D1: 테넌트 설정 및 권한

용도	내용
테넌트 설정	RAG 사용 여부, 허용 문서 집합, Guardrail 온/오프, 지원 언어 등.
Role Profile	사용자 역할·부서·권한. RAG 응답 시 “이 사용자는 어떤 문서에 접근 가능한지” 판단 및 필터링.
기존 연동	tenants_master, tenant_runtime 등 기존 D1 스키마와 연동. RAG 전용 테이블(tenant_ai_settings, role_profiles 등) 확장.

R2: RAG 원본 문서 및 Lifecycle

용도	내용
원본 문서 보관	업로드된 사내 규정·매뉴얼 PDF/Office 파일. 경로 예: `rag-docs/{tenant_id}/{document_set_id}/{object_key}`.
Lifecycle Rules	R2 버킷 Lifecycle Rules로 자동 파기 설정. 예: 업로드 후 N일 경과 시 삭제 또는 아카이브. PDPA 보관 제한 준수.
접근 제어	Worker만 R2에 접근. 클라이언트 직접 접근 없음. 테넌트별 prefix로 격리.

PDPA·규제 연동

보관 기간 초과·동의 철회 시 해당 테넌트/문서 집합에 대한 R2 객체 삭제·Vectorize namespace 정리·D1 메타 삭제 절차를 설계에 포함.
기존 기획서의 “이관 후 삭제”“Fail Sanely” 정책과 정합성 유지.

3. Zuplo 기반 사용량 제한 및 메터링·빌링 (Metering & Billing)

Global Principal Architect 관점에서, Zuplo를 활용해 Cloudflare 기반 RAG AI 사용량을 실시간 측정하고 Stripe 청구와 연동하는 엔터프라이즈급 메터링·빌링 설계를 정의한다.

3.1 핵심 원칙: Edge-based Usage Tracking

모든 요청이 통과하는 게이트웨이 단계에서 토큰 사용량을 계산·기록함으로써, 추가 지연 없이 정확한 과금을 실현한다. 백엔드에 별도 메터링 서비스를 두지 않고 Zuplo의 Inbound/Outbound 정책으로 처리한다.

3.2 사용량 측정 플로우 (The Flow)

단계	담당	내용
1. 인증 및 할당량 확인 (Pre-proc)	Zuplo	요청 수신 시 API Key/JWT에서 `tenant_id` 추출. D1(또는 KV)에서 해당 테넌트의 남은 쿼터(Quota) 조회. 초과 시 즉시 429 반환하여 AI 호출 비용 발생 방지.
2. AI 요청 및 응답	Workers AI	RAG 답변 생성. 응답 헤더 또는 바디에 Workers AI가 제공하는 토큰 수(Token Count) 정보 포함. (예: `x-ai-usage-tokens` 또는 응답 메타데이터 필드.)
3. 사용량 기록 (Post-proc)	Zuplo Custom Policy	Outbound 정책에서 응답을 가로채, 토큰 정보 추출. 비동기로 D1 사용량 테이블 업데이트 및 Stripe Usage-based Billing API(`subscription_items/.../usage_records`)로 전송. 응답 반환 지연을 막기 위해 `waitUntil`(또는 Zuplo 동등 메커니즘)으로 기록을 백그라운드 처리.

3.3 Zuplo Custom Policy 설계 요건 (구현 시 참고)

위치: Zuplo Inbound 정책에서 테넌트 식별·쿼터 체크; Outbound 정책에서 응답 후 사용량 기록. 또는 단일 Outbound 정책에서 “next() 호출 → 응답 수신 → 토큰 추출 → 비동기 기록” 순서로 처리.
입력: 요청 시 request.user.data.tenantId(또는 Consumer 메타데이터)로 테넌트 확정.
기록 내용: tenant_id, 토큰 수, 타임스탬프. D1에는 일/월 누적 또는 row 추가; Stripe에는 quantity=tokens, timestamp, action=increment 전달.
환경/시크릿: D1 접근용 URL 또는 바인딩, Stripe Secret Key, Stripe Subscription Item ID(테넌트·플랜별로 매핑 가능).
코드: 본 문서는 기획만 포함. 실제 Policy 구현 시 Zuplo Policy API·next()·waitUntil 등 공식 문서 참조.

3.4 청구 시나리오 및 요금제 설계

요금제 단계	과금 모델 (Billing Model)	Zuplo 제어 방식 (Enforcement)
Free / Basic	Hard Quota (무료 제공량)	설정된 토큰 수 초과 시 Zuplo가 429 Too Many Requests 반환. AI 요청 자체를 차단.
Premium	Tiered Pricing	일정 사용량까지 포함, 초과분은 1,000 토큰당 $X 등 단가 과금. Stripe Metered Billing으로 usage_records 누적 후 청구서 반영.
Enterprise	Volume Discount	대량 사용 시 단가 인하. 필요 시 전용 게이트웨이 또는 별도 Quota 상한·Stripe Product/Price 매핑.

3.5 Principal Architect 제언 (메터링·빌링)

제언	내용
Idempotency (멱등성)	Stripe에 사용량을 리포트할 때 네트워크 오류로 중복 전송되지 않도록 `idempotency_key` 사용. 예: `tenant_id + request_id + timestamp` 또는 Zuplo 요청 ID.
Soft Limit Alert	사용량이 80%·90%에 도달했을 때 Zuplo Policy 또는 별도 Worker에서 이메일·Slack 알림. 고객이 예기치 않은 청구를 받지 않도록 사전 경고.
Caching	동일·유사 질문에 대해 Zuplo Cache를 활용해 AI 연산을 건너뛰도록 설계. 고객 비용 절감 및 Prego 이익률 개선. 캐시 키에 tenant_id·질문 해시 포함.

4. Vectorize·D1 환경 프로비저닝 (Pulumi)

Cloudflare 파트너십의 자동화된 프로비저닝(Automated Provisioning) 을 위해, 엔터프라이즈 고객의 Cloudflare 계정에 Vectorize 인덱스, D1 데이터베이스, 그리고 이를 제어할 Workers용 서비스 토큰을 한 번에 생성하는 설계를 정의한다.

4.1 목표 및 범위

실행 주체: Pulumi (TypeScript 또는 Python). 고객(파트너 연동 시)의 Cloudflare Account ID를 입력받아 해당 계정에 리소스 생성.
Multi-tenancy: 단일 인덱스 내 Namespace 격리 또는(선택) 테넌트별 인덱스 생성 정책을 반영.
산출물: D1 database_id, Vectorize index name, API Token(Secret) — Zuplo 또는 관리 모듈에서 참조.

4.2 프로비저닝 대상 리소스

리소스	설계 요건
D1 데이터베이스	이름 예: `prego_hr_master_db`. 테넌트 메타데이터·AI 설정·Role Profile 등 저장. 기존 Prego D1 스키마와 연동하거나 고객 전용 DB로 분리.
Vectorize 인덱스	이름 예: `prego-hr-knowledge-base`. Preset: Cloudflare Workers AI 최적화 모델(예: `cf-baai-bge-small-en-v1.5`). 차원은 preset에 따름(예: 768). Statutory(노동법)·Company(사내규정) 등은 Namespace로 구분하여 단일 인덱스에서 다중 용도 지원.
API 토큰 (서비스 계정)	Workers가 D1·R2·Vectorize에 접근하기 위한 API Token. 권한: Workers R2 Storage Write, Workers Scripts Write, D1 Write(또는 계정 내 해당 리소스에 대한 최소 권한). 생성된 토큰 값은 Secret으로만 전달(Pulumi `pulumi.secret` 또는 동등).

4.3 Namespace 격리 전략

권장: 하나의 Vectorize 인덱스에 테넌트별 namespace 할당. 쿼리·upsert 시 항상 해당 tenant_id(namespace)만 지정하여 물리적 격리 없이 논리적 완전 격리 보장.
대안: 엔터프라이즈 고객이 “완전 독립 인덱스”를 요구할 경우, 테넌트별로 별도 Vectorize 인덱스를 Pulumi로 생성하는 옵션을 설계에 포함. 비용·운영 복잡도와의 트레이드오프 문서화.

4.4 구현 가이드 및 파트너 전략 포인트

항목	내용
Zero-Touch Deployment	고객이 자신의 Account ID와(필요 시) API Token만 Prego 설정창에 입력하면, Pulumi 스크립트가 실행되어 1분 이내 AI 인프라(D1, Vectorize, 토큰)가 고객 계정에 셋업. Cloudflare 파트너가 요구하는 강한 통합 경험의 핵심.
Privacy Isolation	Vectorize는 Namespace 또는 테넌트별 인덱스로, D1은 테넌트별 DB 또는 row 수준 격리로 고객 데이터가 물리·논리적으로 분리됨을 보장. 엔터프라이즈 설득용 핵심 메시지.
Cost Transparency	자원이 고객의 Cloudflare 계정에 생성되므로, Vectorize·D1·R2 사용료는 고객의 Cloudflare 청구서에 포함. Prego는 SaaS 서비스료만 청구하여 비즈니스 구조가 투명해짐.

4.5 Zuplo·Worker와의 연동

프로비저닝 결과(D1 database_id, Vectorize index name)를 Zuplo 환경 변수 또는 Control Plane D1에 저장. Worker는 해당 값을 읽어 요청별로 올바른 D1·Vectorize를 사용.
API Token은 Worker 배포 시 Wrangler Secret 또는 Cloudflare 환경 변수로 주입. Zuplo는 인증만 담당하고, 백엔드(Worker)가 D1·Vectorize에 직접 접근.

5. RAG AI 전용 엔터프라이즈 기능 모듈

Cloudflare 파트너로서 차별화된 AI HR 기능을 패키징한다.

기능 모듈	상세 내용 (Enterprise Grade)	Cloudflare 기술 활용
Statutory Guardrail	국가별 노동법 위반 여부를 AI가 실시간 체크하여 경고. 사용자/관리자 입력 또는 AI 생성 문장에 대해 “해당 국가 법령 위반 가능성” 플래그.	Workers AI (Llama 3 / Mistral): 규정 텍스트와 사용자 입력을 함께 넣어 위반 여부 분류·설명 생성.
Document Intelligence	수천 페이지 규모의 사내 규정에서 즉각 답변 추출. “연차 휴가 상한은?” 등 자연어 질의 → 관련 청크 검색 → 요약·답변.	Vectorize (namespace=tenant): 문서 임베딩·유사도 검색. Workers AI: 검색된 컨텍스트 + 질의로 답변 생성.
Privacy Masking	AI 처리 전 NRIC/FIN 등 민감 정보를 엣지에서 비식별화. AI·벡터 저장에는 마스킹된 텍스트만 전달.	Workers: Regex/Pattern Matching으로 NRIC·FIN 등 탐지 후 마스킹(예: `***1234`). Workers AI·Vectorize 입력 직전 단계에서 적용.
Audit Logging	모든 AI 질문·답변·Guardrail 결과를 불변 로그로 저장. 감사·규제 대응.	R2 (로그 전용 prefix 또는 버킷): JSON 로그 적재. Logpush로 장기 보관·분석 연동(선택).

각 모듈은 엣지 오케스트레이션 위에서 동작: Zuplo 인증·Quota → Worker에서 Guardrail / Document Intelligence / Privacy Masking / Audit 순서로 파이프라인 구성.

6. 단계별 로드맵 (Phase 1–4)

6.1 Phase 1: Build (Wrangler & Dev)

목표	내용
배포 표준화	모든 RAG·AI 관련 코드를 Wrangler로 배포 가능한 모듈형 아키텍처로 전환. Workers·Workers AI·Vectorize·D1·R2를 Wrangler 설정으로 통합.
LLM 의존도 제거	@cloudflare/workers-ai 사용으로 외부 LLM API 의존 제거. Llama 3, Mistral 등 Cloudflare 제공 모델만 사용.
산출물	RAG Worker(또는 Worker 그룹), Vectorize 인덱스·namespace 정책, D1 스키마 확장, R2 버킷·Lifecycle 초안, 로컬/스테이징 Wrangler dev 환경. 구현 순서·산출물 상세: rag-ai-phase1-implementation-plan.md.

6.2 Phase 2: Security & Compliance

목표	내용
PDPA·SOC2	싱가포르 PDPA 및 SOC2 등 국제 보안 표준에 맞춘 데이터 파기·접근 제어·감사 로직 검증. R2 Lifecycle·D1 삭제·Vectorize namespace 정리 절차 문서화 및 구현 검증.
아키텍처 리뷰	Cloudflare 보안 팀과 아키텍처 리뷰 진행. 엣지 데이터 처리·키 관리·로그 보존 정책 합치.
산출물	Compliance 체크리스트, 파기·감사 runbook, (선택) Cloudflare 보안 리뷰 결과 요약.

6.3 Phase 3: Zuplo Integration (Gateway-as-a-Service)

목표	내용
고객 제어	고객이 자신의 Cloudflare 계정에서 Zuplo를 통해 Prego API를 제어할 수 있는 커넥터 개발.
제공 기능	테넌트(고객)별 라우팅·Rate Limit·API Key·Quota 설정을 Zuplo 대시보드 또는 Config as Code로 관리. Prego RAG AI 엔드포인트가 Zuplo 뒤에서 동작.
산출물	Zuplo 커넥터/템플릿, 고객 온보딩 가이드, Prego API 스펙(OAS) 정리.

6.4 Phase 4: Marketplace Launch

목표	내용
Cloudflare Apps	Cloudflare Apps에 “Prego HR AI” 런칭. 앱 형태로 설치·설정 가능.
Co-selling	Cloudflare Enterprise 고객 대상 Co-selling(공동 영업) 시작. 파트너 프로그램·영업 자료·기술 지원 프로세스 정리.
산출물	마켓플레이스 리스팅, 가격·패키지 정의, Co-selling 플레이북.

7. 기존 Prego 구조와의 연동

기존 요소	RAG AI 연동
Zuplo	기존 테넌트 Key Management·Rate Limit에 AI Usage Quota 정책 추가. 동일 Zuplo 인스턴스에서 `/ai/*` 라우팅·Quota 적용.
D1	tenants_master·tenant_runtime와 연동. RAG 전용 테이블(tenant_ai_settings, role_profiles, ai_audit_log_meta 등) 추가 시 마이그레이션으로 관리.
R2	기존 정적 에셋·Usage Raw 버킷과 분리된 RAG 전용 버킷(또는 prefix) 및 Lifecycle Rules.
client-web	RAG 질의 UI·Guardrail 경고 표시·Document Intelligence 결과 표시는 client-web 또는 별도 앱에서 Zuplo 경유 API 호출.
PDPA·NRIC	기존 온보딩·민감 정보 마스킹 정책과 통일. RAG 경로에서 Privacy Masking을 “AI 입력 직전” 단계로 고정.

8. 비기능 요건 요약

항목	목표
지연	RAG 응답(엣지~엣지) P95 50ms 미만 목표.
격리	테넌트별 Vectorize namespace·R2 prefix·D1 row 수준 격리.
규제	PDPA 보관 제한·자동 파기(R2 Lifecycle)·감사 로그(R2/Logpush).
확장	Wrangler 기반 모듈형 배포로 기능별 Worker·리소스 추가 용이.

9. 참조 문서

문서	용도
saas-unified-architecture-hetzner-cloudflare-zuplo-plan.md	통합 아키텍처·Zuplo·PDPA·온보딩 연동.
pregoi-infrastructure-r2-api-update-plan.md	R2·API Gateway·환경 변수.
provision-tenant-workflow-design.md	테넌트 프로비저닝·Zuplo 등록.
rag-ai-phase1-implementation-plan.md	RAG AI Phase 1 구현 순서·Data Ingestion·Query Worker·Zuplo 메터링 Policy·검증 체크리스트.

코드 생성 범위: 본 문서는 기획만 포함하며, 구현은 Phase별 별도 요청 시 진행.