Published on

Transaction Processing or Analytics?

Authors

Introduction to Transactions

In the early days of business data processing, writing to a database typically corresponded to a commercial transaction: selling a product ๐Ÿ›๏ธ, ordering from a supplier ๐Ÿ“ฆ, paying employee salaries ๐Ÿ’ฐ, etc. Over time, the term "transaction" became established even for operations not related to money and came to mean a group of read and write operations forming a logical unit.

  • ๐Ÿ’ฅ Atomicity โ€” the minimum mandatory property of any transaction
  • ๐Ÿ›ก๏ธ Full ACID โ€” part of reliability for critical operations
  • โš–๏ธ In practice โ€” different systems choose a balance between reliability (ACID) and performance (weakened guarantees)

๐Ÿ”„ OLTP vs OLAP: Two Different Worlds

Online Transaction Processing (OLTP) โ€” a type of database system used in transaction-oriented applications (banks, accounting systems (ERP, CRM), booking systems, payment systems, etc.), such as many operational (core) business systems.

Two contexts where the term "transaction" is often used:

Type 1: Computer Transactions ๐Ÿ’ป

  • Atomic data modification in a database
  • Example: BEGIN TRANSACTION โ†’ UPDATE โ†’ COMMIT

Type 2: Business Transactions ๐Ÿ’ฐ

  • Economic exchange between parties
  • Example: product sale, bank transfer

How they're related: OLTP uses technical transactions (type 1) to record business operations (type 2).

For example:

  • ๐Ÿ’ณ Store purchase (business transaction)
  • is recorded via ๐Ÿ–ฅ๏ธ SQL transaction in the database

In short: Technical transactions = tool for recording business transactions ๐Ÿ”งโ†’๐Ÿ“Š

OLTP (Online Transaction Processing)

Even when databases started being used for various types of data (blog comments ๐Ÿ“, game actions ๐ŸŽฎ, contacts in address books ๐Ÿ‘ฅ, etc.), the main access model remained similar to business transaction processing.

OLTP Characteristics:

  • Searching for a small number of records by key ๐Ÿ”‘
  • Inserting and updating data based on user input
  • Interactive mode of operation

C Example:

// Example OLTP operation: finding customer information by ID
#include <stdio.h>
#include <string.h>

struct Customer {
    int id;
    char name[50];
    double balance;
};

struct Customer find_customer_by_id(int customer_id) {
    // In a real system, there would be a database query here
    struct Customer customer;
    customer.id = customer_id;
    strcpy(customer.name, "Ivan Ivanov");
    customer.balance = 15000.75;
    return customer;
}

OLAP (Online Analytic Processing)

Online Analytical Processing (OLAP) โ€” an approach for quickly answering multidimensional analytical queries.

Databases also started being used for data analytics, which implies a completely different access model.

*Data Warehousing *OLAP Cube (from Online Analytical Processing) โ€” a multidimensional data structure designed for fast and efficient analysis of large volumes of business data from various perspectives. In simple terms, an OLAP cube can be compared to a pivot table in Excel, but much more powerful and capable of working with huge amounts of information and multiple parameters (dimensions).

Let's Look at OLAP-CUBE

OLAP data is typically stored in "star" or "snowflake" schemas in a relational data warehouse or specialized data management system. Metrics are derived from records in the fact table, and dimensions are from dimension tables. but! Classic OLAP cubes (like in Microsoft Analysis Services) often store data in proprietary multidimensional formats, not in relational "star" tables. The "star" itself is the raw data storage schema on which the cube is built. Additionally, modern MPP systems (Massively Parallel Processing) like ClickHouse or Amazon Redshift often don't use cubes in the classical sense but provide similar speed for OLAP queries directly to relational tables.

๐ŸงŠ OLAP Cube (Data Cube)

๐ŸŽฒ The term "cube" refers to a multidimensional dataset sometimes called a hypercube if the number of dimensions is more than three.

Basic Concepts:

๐Ÿ“ฆ Cube โ€” a multidimensional generalization of a two-dimensional spreadsheet

# Example: 3D data cube
dimensions = ['Products', 'Time', 'Regions']
measures = ['Sales', 'Profit', 'Budget']

Slice

Slice - the process of selecting a rectangular subset of a cube by choosing a single value for one of its dimensions, creating a new cube with one less dimension. The figure shows the slicing operation: sales metrics in all sales regions and all product categories of the company for 2005 and 2006 are "cut out" from the data cube.


Dice

Dice: The dice operation creates a subcube allowing the analyst to select specific values from multiple dimensions. The figure shows the dice operation: the new cube displays sales data for a limited number of product categories, with time and region dimensions covering the same range as before.


Drill Down/Up

Drill down/up allows the user to navigate between data levels, from the most summarized (up) to the most detailed (down). The figure shows the drill-down operation: the analyst moves from the summary category "Outdoor Protective Equipment" to sales metrics of individual products.


Roll-Up

Roll-up involves summarizing data along a dimension. The summarization rule can be an aggregate function, for example, to calculate totals along a hierarchy or apply a set of formulas such as "profit = sales - expenses". ๐Ÿ“Š๐Ÿ’ฐ

Computational Complexity

Common aggregate functions can be expensive to compute when rolling up: โŒ

  • ๐ŸŒ If they cannot be defined from the cube cells, they must be computed from the underlying data
  • โณ Either compute them online (slowly)
  • ๐Ÿ’พ Or precompute them for possible roll-up (large volume)

โœ… Efficient Aggregate Functions

Aggregate functions that can be defined from cells are known as decomposable aggregate functions and allow efficient computation.

๐Ÿ’ฐ Cost of Aggregate Functions When Rolling Up

โŒ Problematic Functions (expensive to compute):

๐Ÿ“Š Median (MEDIAN)

  • Requires complete data sorting for each aggregation level
  • Lacks the composability property - median of subgroups โ‰  median of entire group
  • On-the-fly computation requires storing all original values

๐ŸŽฏ Percentiles (PERCENTILE)

  • Similar to median, require access to the entire dataset
  • 90th percentile cannot be computed from 90th percentiles of subgroups
  • Must store the complete data distribution

๐Ÿ”ข Mode (MODE)

  • Requires counting frequencies of all values
  • The most frequent value in subgroups may not match the overall mode
  • Requires complete recount for accurate determination

๐Ÿ“ˆ Standard Deviation (STDDEV)

  • Not an additive function (In the context of aggregate functions, "additivity" means the ability to compute the overall result from partial results.)
  • Requires knowing the mean value and number of elements
  • For accurate calculation, need โˆ‘x and โˆ‘xยฒ for all data

Examples:

โœ… Easily supported:

  • COUNT ๐Ÿ”ข
  • MAXIMUM โฌ†๏ธ
  • MINIMUM โฌ‡๏ธ
  • SUM โž•

Because they can be computed for each OLAP cube cell and then rolled up, since the total sum (or count, etc.) is the sum of sub-sums.

โŒ Difficult to support:

  • MEDIAN ๐Ÿ“Š

Because it must be computed for each view separately: the median of a set is not the median of medians of subsets.


๐Ÿ”ข Mathematical Definition

(simplified) Mathematically, an OLAP cube is a projection of an RDBMS relation:

f: (X, Y, Z) โ†’ W

Where:

  • X, Y, Z โ€” cube axes (dimensions) ๐Ÿ“
  • W โ€” data filling each cell ๐Ÿ’พ

Projection example:

g: (Product, Time) โ†’ Sales

๐Ÿ’พ Data Storage

OLAP data is typically stored in:

  • ๐ŸŒŸ Star Schema
  • โ„๏ธ Snowflake Schema
  • ๐Ÿ—„๏ธ Specialized data management systems

Explanation (briefly)

๐ŸŒŸ Star Schema

  • Denormalized schema
  • All dimension tables linked DIRECTLY to the fact table
  • Simpler and faster for queries
  • Easier to understand

โ„๏ธ Snowflake Schema

  • Normalized schema
  • Dimension tables SPLIT into sub-tables
  • More complex, but saves space
  • Slower due to more JOINs

What this means in practice:

  • Star โ†’ for analysis speed ๐Ÿ“Šโšก
  • Snowflake โ†’ for storage economy ๐Ÿ’พ๐Ÿ“‰

Practical Examples

๐Ÿ’ผ Business Intelligence:

# Business cube dimensions:
dimensions = [
    'Products ๐Ÿ“ฆ',
    'Time ๐Ÿ“…',
    'Regions ๐ŸŒ',
    'Sales_Channels ๐Ÿช',
    'Customers ๐Ÿ‘ฅ'
]

# Measures (metrics):
measures = [
    'Sales_Volume ๐Ÿ’ฐ',
    'Profit ๐Ÿ“ˆ',
    'Order_Count ๐Ÿ”ข',
    'Average_Check ๐Ÿ’ต'
]

๐Ÿช Retail Example:

Cube: "Store Sales"
โ”œโ”€โ”€ Dimensions:
โ”‚   โ”œโ”€โ”€ Time: Year โ†’ Quarter โ†’ Month โ†’ Day
โ”‚   โ”œโ”€โ”€ Products: Category โ†’ Brand โ†’ Model
โ”‚   โ”œโ”€โ”€ Stores: Region โ†’ City โ†’ Address
โ”‚   โ””โ”€โ”€ Customers: Segment โ†’ Demographics
โ””โ”€โ”€ Measures:
    โ”œโ”€โ”€ Sales Amount
    โ”œโ”€โ”€ Quantity Sold
    โ”œโ”€โ”€ Profit
    โ””โ”€โ”€ Returns

๐ŸŒ Business Intelligence:

  • ๐Ÿ“Š Data Mining โ€” data mining
  • ๐Ÿช Data Mart โ€” data marts
  • ๐Ÿ“ˆ MDX โ€” Multidimensional Expressions
  • ๐Ÿ” XML for Analysis
  • Microsoft Analysis Services
  • Oracle OLAP
  • Apache Kylin
  • ClickHouse

๐Ÿ’ก Key Advantages of OLAP Cube

AdvantageDescription
โšก Fast ResponseInstant answers to complex analytical queries
๐Ÿ” MultidimensionalityData analysis from different perspectives
๐Ÿ“Š FlexibilityInteractive data exploration
๐ŸŽฏ IntuitivenessVisualization understandable to business users

๐Ÿ“‹ Comparison Table: OLTP vs OLAP

OLTP is typically contrasted with Online Analytical Processing (OLAP):

CharacteristicOLTP ๐Ÿš€OLAP ๐Ÿ“Š
Query TypesAll types (read, insert, update, delete)Mostly read-only
ComplexitySimple queriesComplex analytical queries
VolumeLarge number of simple queriesSmaller number of complex queries
PurposeOperation processingBusiness analytics and reporting

๐Ÿข Data Warehousing

Why do we need a data warehouse?

Large companies can have dozens of different transaction processing systems:

  • Customer websites ๐ŸŒ
  • Point-of-sale systems (cash registers) in stores ๐Ÿช
  • Warehouse inventory tracking systems ๐Ÿ“ฆ
  • Transportation route planning systems ๐Ÿšš
  • Human resources management systems ๐Ÿ‘ฅ

Problem: OLTP systems must be highly available and process transactions with low latency. Analytical queries often require scanning large volumes of data and can interfere with transaction operations.

Solution: Data Warehouse โ€” a separate database where analysts can run any queries without affecting OLTP operations.

๐Ÿ”„ ETL Process (Extractโ€“Transformโ€“Load)

Data enters the warehouse through the ETL process:

  1. Extract - data is extracted from OLTP systems
  2. Transform - data is transformed into a schema convenient for analysis
  3. Load - data is loaded into the warehouse

Real-life example: A large bank ๐Ÿฆ might have separate systems for:

  • Mobile banking ๐Ÿ“ฑ
  • Credit operations ๐Ÿ’ณ
  • Mortgage products ๐Ÿ 
  • Investment services ๐Ÿ“ˆ

All this data is combined in a data warehouse for end-to-end analytics.

โญ Star and โ„๏ธ Snowflake: Schemas for Analytics

๐ŸŒŸ Star Schema

At the center of the schema is a fact table, surrounded by dimension tables.

Example for retail:

  • fact_sales (sales fact table) - at the center
  • dim_product (product dimension) ๐Ÿ“ฆ
  • dim_store (store dimension) ๐Ÿช
  • dim_time (time dimension) โฐ
  • dim_customer (customer dimension) ๐Ÿ‘ฅ

C Example:

// Structure for sales fact table
struct FactSale {
    int sale_id;
    int product_id;     // reference to dim_product
    int store_id;       // reference to dim_store
    int time_id;        // reference to dim_time
    int customer_id;    // reference to dim_customer
    double amount;
    int quantity;
    double profit;
};

// Structure for product dimension table
struct DimProduct {
    int product_id;
    char sku[20];
    char description[100];
    char brand[50];
    char category[50];
    double price;
};

โ„๏ธ Snowflake Schema

A more normalized version of the star schema where dimensions are further broken down into sub-dimensions.

Advantage of star schema: simpler for analysts!

๐Ÿ’พ Data Warehouse Features

  • Wide tables: fact tables often have more than 100 columns ๐Ÿ“‹
  • Rich metadata: dimension tables contain all possible information for analysis
  • Optimized for analytical queries: special storage structures and algorithms

Quick Summary: OLTP vs OLAP

๐ŸŽฏ Key Differences

OLTP (Online Transaction Processing)

  • ๐Ÿ’ฐ Real-time operation processing
  • ๐Ÿ”„ Short transactions (read/write)
  • โšก High availability, instant response
  • ๐Ÿฆ Examples: bank transfers, orders, bookings

OLAP (Online Analytical Processing)

  • ๐Ÿ“Š Analytics and reporting
  • ๐Ÿ” Complex queries on big data
  • ๐Ÿ“ˆ Multidimensional analysis (cubes, slices, drill-down)
  • ๐Ÿง  Examples: business intelligence, forecasting

๐Ÿ—๏ธ Architectural Features

OLTP

  • ACID transactions ๐Ÿ›ก๏ธ
  • Normalized schemas
  • Write optimization

OLAP

  • Denormalized schemas (๐ŸŒŸ Star/โ„๏ธ Snowflake)
  • Data warehouses
  • ETL processes
  • Read optimization

๐Ÿ’ก What This Means in Practice

AspectOLTPOLAP
PurposeBusiness operations ๐Ÿš€Business analysis ๐Ÿ“ˆ
DataCurrent operationsHistorical data
UsersOperatorsAnalysts

Conclusion: OLTP ensures the company's daily operations, while OLAP helps understand trends and make strategic decisions. Both approaches are critically important but solve different problems.