ADR-001: Primary Database Architecture Selection

Field	Value
ADR ID	ADR-002
Title	Selection of PostgreSQL as Primary Operational Database
Status	Accepted
Date	2026-05-08
Authors	Shreyas Agarwal
Stakeholders	Product Engineering, Data Engineering, BI Team, Platform Team
Decision Type	Foundational Architecture Decision
Supersedes
Superseded By

1. Context

The platform is being designed as a multi-tenant retail intelligence and catalog normalization system focused primarily on garment retail use cases.

The system is expected to support:

ingestion from heterogeneous client systems (Shopify, Tally ERP, CSV uploads, custom ERPs, etc.)
canonical product and variant normalization
historical sales and inventory analytics
time-series reporting
BI integrations
downstream forecasting and analytical workloads
extensible retail metadata and attributes
high-cardinality garment variant structures
MCP and agent integrations

The platform is not a traditional eCommerce storefront. It is primarily a:

Retail Data Platform + Analytics Layer

The database selection must therefore support both:

operational ingestion and normalization workloads
analytical querying and aggregation workloads

while remaining maintainable for a small-to-medium engineering team.

2. Problem Statement

The system requires a primary database platform capable of handling:

Operational Workloads (OLTP-like)

catalog ingestion
SKU normalization
inventory updates
client synchronization
transactional writes

Analytical Workloads (OLAP-like)

historical sales analytics
trend analysis
inventory aging
sell-through calculations
BI reporting
time-series aggregations

Additionally, the platform must support:

semi-structured retail attributes
evolving client schemas
flexible metadata
multi-tenant isolation
extensible canonical models

without requiring frequent schema rewrites.

3. Architectural Considerations

3.1 OLTP vs OLAP Analysis

The workload was evaluated across both transactional and analytical dimensions.

Capability	OLTP Characteristics	OLAP Characteristics
Catalog ingestion	Yes	No
Inventory synchronization	Yes	No
Historical sales reporting	No	Yes
Power BI integrations	No	Yes
Trend analysis	No	Yes
Variant normalization	Yes	Partial
Time-series analytics	No	Yes

Conclusion:

The platform represents a hybrid transactional and analytical workload (HTAP-lite).

The system therefore requires strong transactional guarantees while also supporting analytical querying efficiently.

3.2 Relational vs Non-Relational Analysis

Relational Database Strengths

strong joins and relationships
transactional consistency
dimensional modeling support
SQL analytical ecosystem
BI interoperability
indexing and partitioning capabilities

Non-Relational Database Strengths

flexible schema evolution
semi-structured document ingestion
dynamic attribute support
rapid adaptation to client variability

Key Finding

The platform requires:

Relational analytical structure
+
Semi-structured attribute flexibility

rather than a purely relational or purely document-oriented model.

4. Options Evaluated

Option A — MongoDB

Advantages

JSON-native
flexible schemas
fast ingestion prototyping
suitable for dynamic retail attributes

Disadvantages

weaker analytical semantics
inefficient complex aggregations at scale
weaker BI interoperability
poor fit for star-schema analytical modeling
more difficult dimensional analytics

Assessment

MongoDB is well-suited for ingestion staging or metadata-heavy workloads but is not ideal as the canonical analytical backbone for the platform.

Option B — PostgreSQL

Advantages

strong transactional guarantees
mature SQL ecosystem
excellent analytical compatibility
support for joins and dimensional modeling
native JSONB support for semi-structured attributes
partitioning support
materialized views
strong BI interoperability
mature operational tooling
extensibility for future analytical scaling

Disadvantages

requires disciplined schema governance
more complex initial modeling than pure document databases

Assessment

PostgreSQL provides the best balance between:

transactional integrity
analytical capability
schema flexibility
operational maintainability

for the expected workload profile.

Option C — ClickHouse

Advantages

exceptional analytical performance
highly optimized columnar OLAP engine
excellent time-series aggregation capabilities

Disadvantages

weaker transactional guarantees
not suitable as primary operational datastore
additional operational complexity

Assessment

ClickHouse may become a future analytical acceleration layer but is not suitable as the initial primary platform database.

Option D — Snowflake / BigQuery

Advantages

enterprise-scale warehousing
highly scalable analytical processing

Disadvantages

excessive complexity for current scale
operational overhead
unsuitable as operational datastore
cost considerations

Assessment

Premature for current platform maturity and engineering requirements.

5. Decision

The platform will adopt:

PostgreSQL

as the primary database platform.

The implementation will use:

relational canonical models
JSONB extensibility for dynamic retail attributes
partitioned fact tables for time-series analytics
materialized aggregate views where appropriate
dimensional modeling principles for analytical workloads

6. Canonical Architectural Direction

The database architecture will follow a layered data model.

6.1 Raw Ingestion Layer

Stores source-native payloads for auditability and replay.

Examples:

shopify_raw_payloads
tally_raw_xml
csv_import_batches

Characteristics:

append-only
immutable
source-specific

6.2 Normalized Layer

Stores cleaned source-system representations.

Examples:

normalized_shopify_products
normalized_tally_inventory

Characteristics:

source-cleaned
partially standardized
ingestion-oriented

6.3 Canonical Business Layer

Stores unified retail business entities.

Examples:

products
variants
inventory_snapshots
sales_transactions
sales_daily
brands
categories
stores

Characteristics:

source-agnostic
analytical-friendly
canonical business semantics

7. Data Modeling Principles

The following modeling principles are adopted:

7.1 Variant-Centric Commerce Model

The system will model:

Product = conceptual retail item
Variant = sellable SKU

All inventory and sales analytics will operate at variant/SKU granularity.

7.2 Separation of Dimensions and Facts

The platform will separate:

Dimensions

Descriptive business entities.

Examples:

dim_products
dim_variants
dim_brands
dim_stores

Facts

Time-series measurable events.

Examples:

fact_sales_transactions
fact_sales_daily
fact_inventory_snapshots

7.3 Flexible Attribute Modeling

Dynamic retail attributes will use:

JSONB

for extensibility.

Examples:

fabric
fit
wash instructions
seasonal metadata
marketplace-specific attributes

7.4 Surrogate Internal IDs

The system will not use external platform IDs as primary business identifiers.

Instead:

internal UUIDs
+
source mapping tables

will be maintained.

8. Consequences

Positive Consequences

strong BI interoperability
scalable analytical querying
flexible catalog modeling
transactional reliability
future extensibility
lower operational complexity
strong ecosystem support
compatibility with AWS managed services

Negative Consequences

increased schema governance requirements
more complex modeling than pure document databases
potential future need for dedicated OLAP acceleration layers at scale

9. Future Considerations

The architecture intentionally allows future evolution toward:

ClickHouse for analytical acceleration
dedicated warehouse layers
event-driven streaming pipelines
search indexing layers
machine learning feature stores

without requiring replacement of the canonical PostgreSQL layer.

10. Final Rationale

PostgreSQL was selected because it provides the strongest alignment with the platform’s combined requirements for:

operational consistency
analytical capability
semi-structured extensibility
retail catalog modeling
time-series analytics
BI compatibility
long-term maintainability

within the expected engineering and operational constraints of the project.