DOM: DATA_INFRASTRUCTURE
Back to Systems Archive

ADR-001: Primary Database Architecture Selection

FieldValue
ADR IDADR-002
TitleSelection of PostgreSQL as Primary Operational Database
StatusAccepted
Date2026-05-08
AuthorsShreyas Agarwal
StakeholdersProduct Engineering, Data Engineering, BI Team, Platform Team
Decision TypeFoundational Architecture Decision
Supersedes
Superseded By

1. Context

The platform is being designed as a multi-tenant retail intelligence and catalog normalization system focused primarily on garment retail use cases.

The system is expected to support:

  • ingestion from heterogeneous client systems (Shopify, Tally ERP, CSV uploads, custom ERPs, etc.)
  • canonical product and variant normalization
  • historical sales and inventory analytics
  • time-series reporting
  • BI integrations
  • downstream forecasting and analytical workloads
  • extensible retail metadata and attributes
  • high-cardinality garment variant structures
  • MCP and agent integrations

The platform is not a traditional eCommerce storefront. It is primarily a:

Retail Data Platform + Analytics Layer

The database selection must therefore support both:

  • operational ingestion and normalization workloads
  • analytical querying and aggregation workloads

while remaining maintainable for a small-to-medium engineering team.


2. Problem Statement

The system requires a primary database platform capable of handling:

Operational Workloads (OLTP-like)

  • catalog ingestion
  • SKU normalization
  • inventory updates
  • client synchronization
  • transactional writes

Analytical Workloads (OLAP-like)

  • historical sales analytics
  • trend analysis
  • inventory aging
  • sell-through calculations
  • BI reporting
  • time-series aggregations

Additionally, the platform must support:

  • semi-structured retail attributes
  • evolving client schemas
  • flexible metadata
  • multi-tenant isolation
  • extensible canonical models

without requiring frequent schema rewrites.


3. Architectural Considerations


3.1 OLTP vs OLAP Analysis

The workload was evaluated across both transactional and analytical dimensions.

CapabilityOLTP CharacteristicsOLAP Characteristics
Catalog ingestionYesNo
Inventory synchronizationYesNo
Historical sales reportingNoYes
Power BI integrationsNoYes
Trend analysisNoYes
Variant normalizationYesPartial
Time-series analyticsNoYes

Conclusion:

The platform represents a hybrid transactional and analytical workload (HTAP-lite).

The system therefore requires strong transactional guarantees while also supporting analytical querying efficiently.


3.2 Relational vs Non-Relational Analysis

Relational Database Strengths

  • strong joins and relationships
  • transactional consistency
  • dimensional modeling support
  • SQL analytical ecosystem
  • BI interoperability
  • indexing and partitioning capabilities

Non-Relational Database Strengths

  • flexible schema evolution
  • semi-structured document ingestion
  • dynamic attribute support
  • rapid adaptation to client variability

Key Finding

The platform requires:

Relational analytical structure
+
Semi-structured attribute flexibility

rather than a purely relational or purely document-oriented model.


4. Options Evaluated


Option A — MongoDB

Advantages

  • JSON-native
  • flexible schemas
  • fast ingestion prototyping
  • suitable for dynamic retail attributes

Disadvantages

  • weaker analytical semantics
  • inefficient complex aggregations at scale
  • weaker BI interoperability
  • poor fit for star-schema analytical modeling
  • more difficult dimensional analytics

Assessment

MongoDB is well-suited for ingestion staging or metadata-heavy workloads but is not ideal as the canonical analytical backbone for the platform.


Option B — PostgreSQL

Advantages

  • strong transactional guarantees
  • mature SQL ecosystem
  • excellent analytical compatibility
  • support for joins and dimensional modeling
  • native JSONB support for semi-structured attributes
  • partitioning support
  • materialized views
  • strong BI interoperability
  • mature operational tooling
  • extensibility for future analytical scaling

Disadvantages

  • requires disciplined schema governance
  • more complex initial modeling than pure document databases

Assessment

PostgreSQL provides the best balance between:

  • transactional integrity
  • analytical capability
  • schema flexibility
  • operational maintainability

for the expected workload profile.


Option C — ClickHouse

Advantages

  • exceptional analytical performance
  • highly optimized columnar OLAP engine
  • excellent time-series aggregation capabilities

Disadvantages

  • weaker transactional guarantees
  • not suitable as primary operational datastore
  • additional operational complexity

Assessment

ClickHouse may become a future analytical acceleration layer but is not suitable as the initial primary platform database.


Option D — Snowflake / BigQuery

Advantages

  • enterprise-scale warehousing
  • highly scalable analytical processing

Disadvantages

  • excessive complexity for current scale
  • operational overhead
  • unsuitable as operational datastore
  • cost considerations

Assessment

Premature for current platform maturity and engineering requirements.


5. Decision

The platform will adopt:

PostgreSQL

as the primary database platform.

The implementation will use:

  • relational canonical models
  • JSONB extensibility for dynamic retail attributes
  • partitioned fact tables for time-series analytics
  • materialized aggregate views where appropriate
  • dimensional modeling principles for analytical workloads

6. Canonical Architectural Direction

The database architecture will follow a layered data model.


6.1 Raw Ingestion Layer

Stores source-native payloads for auditability and replay.

Examples:

shopify_raw_payloads
tally_raw_xml
csv_import_batches

Characteristics:

  • append-only
  • immutable
  • source-specific

6.2 Normalized Layer

Stores cleaned source-system representations.

Examples:

normalized_shopify_products
normalized_tally_inventory

Characteristics:

  • source-cleaned
  • partially standardized
  • ingestion-oriented

6.3 Canonical Business Layer

Stores unified retail business entities.

Examples:

products
variants
inventory_snapshots
sales_transactions
sales_daily
brands
categories
stores

Characteristics:

  • source-agnostic
  • analytical-friendly
  • canonical business semantics

7. Data Modeling Principles

The following modeling principles are adopted:


7.1 Variant-Centric Commerce Model

The system will model:

Product = conceptual retail item
Variant = sellable SKU

All inventory and sales analytics will operate at variant/SKU granularity.


7.2 Separation of Dimensions and Facts

The platform will separate:

Dimensions

Descriptive business entities.

Examples:

dim_products
dim_variants
dim_brands
dim_stores

Facts

Time-series measurable events.

Examples:

fact_sales_transactions
fact_sales_daily
fact_inventory_snapshots

7.3 Flexible Attribute Modeling

Dynamic retail attributes will use:

JSONB

for extensibility.

Examples:

  • fabric
  • fit
  • wash instructions
  • seasonal metadata
  • marketplace-specific attributes

7.4 Surrogate Internal IDs

The system will not use external platform IDs as primary business identifiers.

Instead:

internal UUIDs
+
source mapping tables

will be maintained.


8. Consequences


Positive Consequences

  • strong BI interoperability
  • scalable analytical querying
  • flexible catalog modeling
  • transactional reliability
  • future extensibility
  • lower operational complexity
  • strong ecosystem support
  • compatibility with AWS managed services

Negative Consequences

  • increased schema governance requirements
  • more complex modeling than pure document databases
  • potential future need for dedicated OLAP acceleration layers at scale

9. Future Considerations

The architecture intentionally allows future evolution toward:

  • ClickHouse for analytical acceleration
  • dedicated warehouse layers
  • event-driven streaming pipelines
  • search indexing layers
  • machine learning feature stores

without requiring replacement of the canonical PostgreSQL layer.


10. Final Rationale

PostgreSQL was selected because it provides the strongest alignment with the platform’s combined requirements for:

  • operational consistency
  • analytical capability
  • semi-structured extensibility
  • retail catalog modeling
  • time-series analytics
  • BI compatibility
  • long-term maintainability

within the expected engineering and operational constraints of the project.