- Published on
Transaction Processing or Analytics?
- Authors

- Name
- dima853
- @_dima853
Introduction to Transactions
In the early days of business data processing, writing to a database typically corresponded to a commercial transaction: selling a product ๐๏ธ, ordering from a supplier ๐ฆ, paying employee salaries ๐ฐ, etc. Over time, the term "transaction" became established even for operations not related to money and came to mean a group of read and write operations forming a logical unit.
- ๐ฅ Atomicity โ the minimum mandatory property of any transaction
- ๐ก๏ธ Full ACID โ part of reliability for critical operations
- โ๏ธ In practice โ different systems choose a balance between reliability (ACID) and performance (weakened guarantees)
๐ OLTP vs OLAP: Two Different Worlds
Online Transaction Processing (OLTP) โ a type of database system used in transaction-oriented applications (banks, accounting systems (ERP, CRM), booking systems, payment systems, etc.), such as many operational (core) business systems.
Two contexts where the term "transaction" is often used:
Type 1: Computer Transactions ๐ป
- Atomic data modification in a database
- Example:
BEGIN TRANSACTION โ UPDATE โ COMMIT
Type 2: Business Transactions ๐ฐ
- Economic exchange between parties
- Example: product sale, bank transfer
How they're related: OLTP uses technical transactions (type 1) to record business operations (type 2).
For example:
- ๐ณ Store purchase (business transaction)
- is recorded via ๐ฅ๏ธ SQL transaction in the database
In short: Technical transactions = tool for recording business transactions ๐งโ๐
OLTP (Online Transaction Processing)
Even when databases started being used for various types of data (blog comments ๐, game actions ๐ฎ, contacts in address books ๐ฅ, etc.), the main access model remained similar to business transaction processing.
OLTP Characteristics:
- Searching for a small number of records by key ๐
- Inserting and updating data based on user input
- Interactive mode of operation
C Example:
// Example OLTP operation: finding customer information by ID
#include <stdio.h>
#include <string.h>
struct Customer {
int id;
char name[50];
double balance;
};
struct Customer find_customer_by_id(int customer_id) {
// In a real system, there would be a database query here
struct Customer customer;
customer.id = customer_id;
strcpy(customer.name, "Ivan Ivanov");
customer.balance = 15000.75;
return customer;
}
OLAP (Online Analytic Processing)
Online Analytical Processing (OLAP) โ an approach for quickly answering multidimensional analytical queries.
Databases also started being used for data analytics, which implies a completely different access model.

*Data Warehousing *OLAP Cube (from Online Analytical Processing) โ a multidimensional data structure designed for fast and efficient analysis of large volumes of business data from various perspectives. In simple terms, an OLAP cube can be compared to a pivot table in Excel, but much more powerful and capable of working with huge amounts of information and multiple parameters (dimensions).
Let's Look at OLAP-CUBE
OLAP data is typically stored in "star" or "snowflake" schemas in a relational data warehouse or specialized data management system. Metrics are derived from records in the fact table, and dimensions are from dimension tables. but! Classic OLAP cubes (like in Microsoft Analysis Services) often store data in proprietary multidimensional formats, not in relational "star" tables. The "star" itself is the raw data storage schema on which the cube is built. Additionally, modern MPP systems (Massively Parallel Processing) like ClickHouse or Amazon Redshift often don't use cubes in the classical sense but provide similar speed for OLAP queries directly to relational tables.
๐ง OLAP Cube (Data Cube)
๐ฒ The term "cube" refers to a multidimensional dataset sometimes called a hypercube if the number of dimensions is more than three.
Basic Concepts:
๐ฆ Cube โ a multidimensional generalization of a two-dimensional spreadsheet
# Example: 3D data cube
dimensions = ['Products', 'Time', 'Regions']
measures = ['Sales', 'Profit', 'Budget']
Slice

Slice - the process of selecting a rectangular subset of a cube by choosing a single value for one of its dimensions, creating a new cube with one less dimension. The figure shows the slicing operation: sales metrics in all sales regions and all product categories of the company for 2005 and 2006 are "cut out" from the data cube.
Dice
Dice: The dice operation creates a subcube allowing the analyst to select specific values from multiple dimensions. The figure shows the dice operation: the new cube displays sales data for a limited number of product categories, with time and region dimensions covering the same range as before.
Drill Down/Up
Drill down/up allows the user to navigate between data levels, from the most summarized (up) to the most detailed (down). The figure shows the drill-down operation: the analyst moves from the summary category "Outdoor Protective Equipment" to sales metrics of individual products.
Roll-Up
Roll-up involves summarizing data along a dimension. The summarization rule can be an aggregate function, for example, to calculate totals along a hierarchy or apply a set of formulas such as "profit = sales - expenses". ๐๐ฐ
Computational Complexity
Common aggregate functions can be expensive to compute when rolling up: โ
- ๐ If they cannot be defined from the cube cells, they must be computed from the underlying data
- โณ Either compute them online (slowly)
- ๐พ Or precompute them for possible roll-up (large volume)
โ Efficient Aggregate Functions
Aggregate functions that can be defined from cells are known as decomposable aggregate functions and allow efficient computation.
๐ฐ Cost of Aggregate Functions When Rolling Up
โ Problematic Functions (expensive to compute):
๐ Median (MEDIAN)
- Requires complete data sorting for each aggregation level
- Lacks the composability property - median of subgroups โ median of entire group
- On-the-fly computation requires storing all original values
๐ฏ Percentiles (PERCENTILE)
- Similar to median, require access to the entire dataset
- 90th percentile cannot be computed from 90th percentiles of subgroups
- Must store the complete data distribution
๐ข Mode (MODE)
- Requires counting frequencies of all values
- The most frequent value in subgroups may not match the overall mode
- Requires complete recount for accurate determination
๐ Standard Deviation (STDDEV)
- Not an additive function (In the context of aggregate functions, "additivity" means the ability to compute the overall result from partial results.)
- Requires knowing the mean value and number of elements
- For accurate calculation, need โx and โxยฒ for all data
Examples:
โ Easily supported:
COUNT๐ขMAXIMUMโฌ๏ธMINIMUMโฌ๏ธSUMโ
Because they can be computed for each OLAP cube cell and then rolled up, since the total sum (or count, etc.) is the sum of sub-sums.
โ Difficult to support:
MEDIAN๐
Because it must be computed for each view separately: the median of a set is not the median of medians of subsets.
๐ข Mathematical Definition
(simplified) Mathematically, an OLAP cube is a projection of an RDBMS relation:
f: (X, Y, Z) โ W
Where:
- X, Y, Z โ cube axes (dimensions) ๐
- W โ data filling each cell ๐พ
Projection example:
g: (Product, Time) โ Sales
๐พ Data Storage
OLAP data is typically stored in:
- ๐ Star Schema
- โ๏ธ Snowflake Schema
- ๐๏ธ Specialized data management systems
Explanation (briefly)
๐ Star Schema
- Denormalized schema
- All dimension tables linked DIRECTLY to the fact table
- Simpler and faster for queries
- Easier to understand
โ๏ธ Snowflake Schema
- Normalized schema
- Dimension tables SPLIT into sub-tables
- More complex, but saves space
- Slower due to more JOINs
What this means in practice:
- Star โ for analysis speed ๐โก
- Snowflake โ for storage economy ๐พ๐
Practical Examples
๐ผ Business Intelligence:
# Business cube dimensions:
dimensions = [
'Products ๐ฆ',
'Time ๐
',
'Regions ๐',
'Sales_Channels ๐ช',
'Customers ๐ฅ'
]
# Measures (metrics):
measures = [
'Sales_Volume ๐ฐ',
'Profit ๐',
'Order_Count ๐ข',
'Average_Check ๐ต'
]
๐ช Retail Example:
Cube: "Store Sales"
โโโ Dimensions:
โ โโโ Time: Year โ Quarter โ Month โ Day
โ โโโ Products: Category โ Brand โ Model
โ โโโ Stores: Region โ City โ Address
โ โโโ Customers: Segment โ Demographics
โโโ Measures:
โโโ Sales Amount
โโโ Quantity Sold
โโโ Profit
โโโ Returns
๐ Related Technologies
๐ Business Intelligence:
- ๐ Data Mining โ data mining
- ๐ช Data Mart โ data marts
- ๐ MDX โ Multidimensional Expressions
- ๐ XML for Analysis
๐ ๏ธ Popular OLAP Systems:
- Microsoft Analysis Services
- Oracle OLAP
- Apache Kylin
- ClickHouse
๐ก Key Advantages of OLAP Cube
| Advantage | Description |
|---|---|
| โก Fast Response | Instant answers to complex analytical queries |
| ๐ Multidimensionality | Data analysis from different perspectives |
| ๐ Flexibility | Interactive data exploration |
| ๐ฏ Intuitiveness | Visualization understandable to business users |
๐ Comparison Table: OLTP vs OLAP
OLTP is typically contrasted with Online Analytical Processing (OLAP):
| Characteristic | OLTP ๐ | OLAP ๐ |
|---|---|---|
| Query Types | All types (read, insert, update, delete) | Mostly read-only |
| Complexity | Simple queries | Complex analytical queries |
| Volume | Large number of simple queries | Smaller number of complex queries |
| Purpose | Operation processing | Business analytics and reporting |
๐ข Data Warehousing
Why do we need a data warehouse?
Large companies can have dozens of different transaction processing systems:
- Customer websites ๐
- Point-of-sale systems (cash registers) in stores ๐ช
- Warehouse inventory tracking systems ๐ฆ
- Transportation route planning systems ๐
- Human resources management systems ๐ฅ
Problem: OLTP systems must be highly available and process transactions with low latency. Analytical queries often require scanning large volumes of data and can interfere with transaction operations.
Solution: Data Warehouse โ a separate database where analysts can run any queries without affecting OLTP operations.
๐ ETL Process (ExtractโTransformโLoad)
Data enters the warehouse through the ETL process:
- Extract - data is extracted from OLTP systems
- Transform - data is transformed into a schema convenient for analysis
- Load - data is loaded into the warehouse
Real-life example: A large bank ๐ฆ might have separate systems for:
- Mobile banking ๐ฑ
- Credit operations ๐ณ
- Mortgage products ๐
- Investment services ๐
All this data is combined in a data warehouse for end-to-end analytics.
โญ Star and โ๏ธ Snowflake: Schemas for Analytics
๐ Star Schema
At the center of the schema is a fact table, surrounded by dimension tables.
Example for retail:
fact_sales(sales fact table) - at the centerdim_product(product dimension) ๐ฆdim_store(store dimension) ๐ชdim_time(time dimension) โฐdim_customer(customer dimension) ๐ฅ
C Example:
// Structure for sales fact table
struct FactSale {
int sale_id;
int product_id; // reference to dim_product
int store_id; // reference to dim_store
int time_id; // reference to dim_time
int customer_id; // reference to dim_customer
double amount;
int quantity;
double profit;
};
// Structure for product dimension table
struct DimProduct {
int product_id;
char sku[20];
char description[100];
char brand[50];
char category[50];
double price;
};
โ๏ธ Snowflake Schema
A more normalized version of the star schema where dimensions are further broken down into sub-dimensions.
Advantage of star schema: simpler for analysts!
๐พ Data Warehouse Features
- Wide tables: fact tables often have more than 100 columns ๐
- Rich metadata: dimension tables contain all possible information for analysis
- Optimized for analytical queries: special storage structures and algorithms
Quick Summary: OLTP vs OLAP
๐ฏ Key Differences
OLTP (Online Transaction Processing)
- ๐ฐ Real-time operation processing
- ๐ Short transactions (read/write)
- โก High availability, instant response
- ๐ฆ Examples: bank transfers, orders, bookings
OLAP (Online Analytical Processing)
- ๐ Analytics and reporting
- ๐ Complex queries on big data
- ๐ Multidimensional analysis (cubes, slices, drill-down)
- ๐ง Examples: business intelligence, forecasting
๐๏ธ Architectural Features
OLTP
- ACID transactions ๐ก๏ธ
- Normalized schemas
- Write optimization
OLAP
- Denormalized schemas (๐ Star/โ๏ธ Snowflake)
- Data warehouses
- ETL processes
- Read optimization
๐ก What This Means in Practice
| Aspect | OLTP | OLAP |
|---|---|---|
| Purpose | Business operations ๐ | Business analysis ๐ |
| Data | Current operations | Historical data |
| Users | Operators | Analysts |
Conclusion: OLTP ensures the company's daily operations, while OLAP helps understand trends and make strategic decisions. Both approaches are critically important but solve different problems.