Run Your First Ingestion
DataForge runs as an ephemeral, per-job container in your own cloud project. Your data never leaves your environment — the engine reads your source and writes your destination entirely inside your walls, then shuts down. Choose your cloud to begin.
Subscribe on the Marketplace
Subscribe to DataForge on Google Cloud Marketplace and complete sign-up. This links your account and activates your entitlement — nothing is deployed to your project by it.
Create a runtime service account
The engine runs as a dedicated service account in your project, with only what a run touches — no marketplace, billing, or procurement permissions.
PROJECT_ID=YOUR_PROJECT_ID; REGION=us-east1
gcloud iam service-accounts create dataforge-runtime \
--display-name='DataForge Runtime' --project=${PROJECT_ID}
SA=dataforge-runtime@${PROJECT_ID}.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SA} --role=roles/cloudsql.client
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SA} --role=roles/storage.objectViewer
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SA} --role=roles/secretmanager.secretAccessor
Store your telemetry key
Each run authenticates with your telemetry key — minted once from your signed-in DataForge dashboard (it identifies your subscription for billing and authorizes nothing else). Save it in Secret Manager:
printf '%s' "${TELEMETRY_KEY}" | gcloud secrets create dataforge-telemetry-key \
--data-file=- --project=${PROJECT_ID}
gcloud secrets add-iam-policy-binding dataforge-telemetry-key \
--member=serviceAccount:${SA} --role=roles/secretmanager.secretAccessor --project=${PROJECT_ID}
Define the engine job (once)
Create the Cloud Run Job that runs the engine. It pulls the public, entitlement-gated image and is wired with the control-plane endpoint and your key. There is nothing to keep running.
gcloud run jobs create dataforge-worker \
--image=us-east1-docker.pkg.dev/dataforge-prod/dataforge-engine/worker:latest \
--region=${REGION} --service-account=${SA} --max-retries=0 --task-timeout=24h \
--set-env-vars=DATAFORGE_TELEMETRY_ENDPOINT=https://app.hyperiondataforge.net/api/v1/telemetry \
--set-secrets=DATAFORGE_TELEMETRY_KEY=dataforge-telemetry-key:latest \
--set-cloudsql-instances=${PROJECT_ID}:${REGION}:YOUR_SQL_INSTANCE \
--project=${PROJECT_ID}
Run your first ingestion
Run a job by executing the worker with the job spec as per-execution overrides — they apply to that run only, so concurrent runs are independent.
gcloud run jobs execute dataforge-worker --region=${REGION} --project=${PROJECT_ID} --wait \
--update-env-vars=^@@^DATAFORGE_SOURCE=gs://YOUR_BUCKET/your-data.csv@@\
DATAFORGE_CONN=postgres://USER:PASS@/DBNAME?host=/cloudsql/${PROJECT_ID}:${REGION}:YOUR_SQL_INSTANCE@@\
DATAFORGE_SCHEMA=public@@DATAFORGE_TABLE=your_table
The engine validates your subscription, ingests the source into your destination (creating the table if needed), reports the run meter, and exits. A clean run logs entitlement check passed, telemetry reported, and job complete. That's your first ingestion — your data never left your project.
Run the steps above, or talk to us about your first workload.