All ProjectsAI

Synthesize IO

Production-Ready Synthetic Data, Generated in Seconds, at Any Scale

View Live Project
Synthesize IO screenshot
Synthesize IO screenshot 1Synthesize IO screenshot 2Synthesize IO screenshot 3Synthesize IO screenshot 4Synthesize IO screenshot 5Synthesize IO screenshot 6Synthesize IO screenshot 71 / 7

500+

Monthly Active Users

1M+ rows/min

Generation Speed

CSV, JSON, SQL, Parquet, Excel

Export Formats

About the Project

Synthesize IO is an AI-powered synthetic data platform built as a TypeScript monorepo with two Next.js portals, a user-facing data studio and an admin portal, both styled with Shadcn UI. The FastAPI Python backend orchestrates Microsoft's Synthcity library and Faker for statistically realistic dataset generation; Celery workers handle async generation jobs with Redis as the task broker, all Dockerised on Hostinger VPS with Nginx. DodoPayments powers pay-as-you-go billing; Nodemailer delivers job completion and export notifications. With 500+ monthly active users generating 1M+ rows per minute, it supports CSV, JSON, SQL, Parquet, and Excel exports across GDPR, HIPAA, CCPA, and SOC2-compliant workflows.

How It Works

  1. 1

    The two-portal Next.js monorepo (user studio + admin portal) shares TypeScript types and a FastAPI client wrapper; Docker Compose orchestrates the Next.js containers, FastAPI backend, Celery worker fleet, Redis broker, and PostgreSQL database on Hostinger VPS with Nginx as the reverse proxy.

  2. 2

    Users describe their dataset in plain English through the Shadcn UI studio interface; the FastAPI backend passes the natural language schema prompt to an LLM that infers column semantics, data types, distributions, and relational constraints, storing the resolved schema in PostgreSQL.

  3. 3

    Generation jobs are dispatched to the Celery worker fleet via Redis; each worker runs Synthcity's statistical synthesis models for structured relational data and Faker for domain-specific values (names, addresses, IBANs, phone numbers), guaranteeing referential integrity across foreign key relationships.

  4. 4

    Completed datasets are written to PostgreSQL-backed storage and made available for streaming export in CSV, JSON, SQL, Parquet, or Excel format; Nodemailer fires a job-completion email with a signed download link valid for 24 hours.

  5. 5

    DodoPayments handles pay-as-you-go billing by row volume tier with webhook-driven credit top-ups in PostgreSQL; Google Analytics tracks funnel drop-off from schema definition to first export, and Google Search Console surfaces the platform for synthetic data generation and GDPR-compliant test data queries.

Tech Stack

Next.jsTypeScriptFastAPISynthcityFakerPostgreSQLRedisCeleryDockerDodoPaymentsShadcn UIHostinger VPSHuskyPrettierJestPlaywright

Want to build something like this?

We'd love to hear about your project. Let's talk about what you're building.