Skip to content

English {#english}

Zuplo AI Metering Policy (RAG Quota + Usage → D1 / Stripe)

Purpose: Check quota before RAG query (Inbound); record token usage after call (Outbound).
References: rag-ai-edge-architecture-plan.md §3, rag-ai-phase1-implementation-plan.md §2.5.


Flow

StepOwnerAction
InboundZuplo Custom PolicyAPI Key → tenant_id (Consumer metadata). Call Control Plane GET /internal/ai-quota?tenant_id=.... If remaining_daily <= 0 return 429.
BackendRAG Query WorkerInclude response header x-ai-usage-tokens.
OutboundZuplo Custom PolicyAfter next(), read x-ai-usage-tokens from response. Async (waitUntil etc.) call Control Plane POST /internal/ai-usage → D1 upsert + Stripe usage_records (idempotency_key).

Control Plane endpoints (already implemented)

  • GET /internal/ai-quota?tenant_id=
    Auth: Authorization: Bearer <INTERNAL_API_KEY>. Response: { remaining_daily, remaining_monthly }. Zuplo Inbound returns 429 when remaining_daily <= 0.
  • POST /internal/ai-usage
    Auth: same. Body: { "tenant_id": "...", "tokens_used": 123, "idempotency_key": "optional" }. Updates D1 tenant_ai_usage_daily and Stripe subscription_items/.../usage_records when tenant has stripe_subscription_item_id.

Zuplo policy example (pseudocode)

Implement as Custom Policy in Zuplo Dashboard (adjust to Zuplo Policy API and env vars).

Inbound — Quota check: Resolve tenant_id from request.user?.data?.tenantId or Consumer metadata; 401 if missing. Fetch Control Plane /internal/ai-quota?tenant_id=...; if remaining_daily <= 0 return 429 JSON; else return next().

Outbound — Usage recording: response = await next(). Read x-ai-usage-tokens from response; if tokensUsed > 0 and tenantId, call POST /internal/ai-usage asynchronously (waitUntil) with tenant_id, tokens_used, idempotency_key. Return response.


Environment variables (Zuplo)

NameDescription
CONTROL_PLANE_URLControl Plane Worker URL (e.g. https://prego-control-plane.workers.dev)
INTERNAL_API_KEYBearer token for Control Plane /internal/* (same as Control Plane INTERNAL_API_KEY)

Consumer (API Key) metadata must include tenantId for Inbound/Outbound. Zuplo Sync already registers with metadata: { tenantId }.


한국어 {#korean}

Zuplo AI 메터링 정책 (RAG Quota + Usage → D1 / Stripe)

목적: RAG Query 호출 전 쿼터 확인(Inbound), 호출 후 토큰 사용량 기록(Outbound).
참조: rag-ai-edge-architecture-plan.md §3, rag-ai-phase1-implementation-plan.md §2.5.


흐름

단계담당동작
InboundZuplo Custom PolicyAPI Key → tenant_id(Consumer 메타). Control Plane GET /internal/ai-quota?tenant_id=... 호출. remaining_daily <= 0 이면 429 반환.
BackendRAG Query Worker응답 헤더 x-ai-usage-tokens 포함.
OutboundZuplo Custom Policynext() 후 응답에서 x-ai-usage-tokens 추출. 비동기(waitUntil 등)로 Control Plane POST /internal/ai-usage 호출 → D1 누적 + Stripe usage_records(idempotency_key).

Control Plane 엔드포인트 (이미 구현)

  • GET /internal/ai-quota?tenant_id=

    • 인증: Authorization: Bearer <INTERNAL_API_KEY>
    • 응답: { remaining_daily, remaining_monthly }
    • Zuplo Inbound에서 remaining_daily <= 0 이면 429.
  • POST /internal/ai-usage

    • 인증: Authorization: Bearer <INTERNAL_API_KEY>
    • Body: { "tenant_id": "...", "tokens_used": 123, "idempotency_key": "optional" }
    • 동작: D1 tenant_ai_usage_daily upsert, Stripe subscription_items/.../usage_records (테넌트에 stripe_subscription_item_id 있을 때).

Zuplo 정책 예시 (의사 코드)

Zuplo Dashboard에서 Custom Policy로 아래 로직을 구현한다.
(실제 문법은 Zuplo Policy API·환경 변수에 맞게 조정.)

Inbound — Quota 체크

// 1. tenant_id 확정: request.user?.data?.tenantId 또는 Consumer metadata
const tenantId = request.user?.data?.tenantId ?? request.user?.metadata?.tenantId;
if (!tenantId) return Response(401, "Missing tenant");
// 2. Control Plane에 쿼터 조회
const res = await fetch(
`${CONTROL_PLANE_URL}/internal/ai-quota?tenant_id=${encodeURIComponent(tenantId)}`,
{ headers: { Authorization: `Bearer ${INTERNAL_API_KEY}` } }
);
const { remaining_daily } = await res.json();
// 3. 초과 시 429
if (remaining_daily <= 0) {
return new Response(JSON.stringify({ error: "AI quota exceeded" }), { status: 429, headers: { "Content-Type": "application/json" } });
}
return next();

Outbound — 사용량 기록

const response = await next();
// x-ai-usage-tokens 추출 (백엔드가 RAG Query Worker인 경우만 처리 가능)
const tokensHeader = response.headers.get("x-ai-usage-tokens");
const tokensUsed = tokensHeader ? parseInt(tokensHeader, 10) : 0;
if (tokensUsed > 0 && tenantId) {
// 비동기로 기록 (waitUntil 또는 Zuplo 동등 메커니즘)
waitUntil(
fetch(`${CONTROL_PLANE_URL}/internal/ai-usage`, {
method: "POST",
headers: {
Authorization: `Bearer ${INTERNAL_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
tenant_id: tenantId,
tokens_used: tokensUsed,
idempotency_key: `${tenantId}_${requestId}_${Date.now()}`,
}),
})
);
}
return response;

환경 변수 (Zuplo)

이름설명
CONTROL_PLANE_URLControl Plane Worker URL (예: https://prego-control-plane.workers.dev)
INTERNAL_API_KEYControl Plane /internal/* Bearer 토큰 (Control Plane의 INTERNAL_API_KEY와 동일)

Consumer(API Key) 메타데이터에 tenantId가 저장되어 있어야 Inbound/Outbound에서 테넌트를 알 수 있다. Zuplo Sync 시 이미 metadata: { tenantId } 로 등록됨.

Help