- Published on
JVM, Memory Management и Performance
- Authors

- Name
- dima853
- @_dima853
JVM, Memory Management, and Performance
- Explain the complete lifecycle of an object in the heap (Heap). From creation to garbage collection, including generations (Young Gen, Old Gen), Eden, S0, S1.
- What is Garbage Collection (GC)? Explain the main algorithms (Mark-Sweep, Mark-Compact, Copying) and their trade-offs.
- Describe the differences between Serial, Parallel, CMS, G1, and ZGC garbage collectors. In which scenarios is each preferable?
- What are Stop-The-World (STW) pauses? How do different GCs affect their duration and frequency?
- Explain what a "memory leak" is in Java. Provide concrete examples from practice (e.g., in static collections, caches, unclosed resources).
- What is Metaspace (Java 8+) and how does it differ from PermGen? What causes OutOfMemoryError: Metaspace?
- Explain the String Pool (String Table). How does the
intern()method work and when is its use justified? - What is Escape Analysis and how does it help with optimization? (Connection to Stack Allocation and Scalar Replacement).
- Describe the memory structure of a Java thread (Stack Memory). What is stored in a method frame (local variables, operand stack, reference to runtime constant pool)?
- What is JIT compilation (C1, C2/C1 and C2 (Tiered Compilation))? What is code "profiling" and deoptimization?
- Explain the principle of operation of a
volatilevariable. What is "happens-before" and how does it ensure visibility of changes between threads? - What is false sharing and how to avoid it? (For example, using
@Contended).
Answers to Questions:
*this article contains simplifications
1. Lifecycle of an object in the heap: from allocation to reincarnation
Creation (Allocation):
- The vast majority of objects are allocated in Eden Space (Young Generation). Allocation happens via the Pointer Bump mechanism (
TLAB— Thread-Local Allocation Buffer), which reduces the operation to pointer increment — O(1), requiring almost no synchronization. - Large objects (threshold depends on JVM, often > 512KB-1MB) go directly into Old Generation (Humongous Region in G1), bypassing Young Gen, to avoid expensive copying.
Early life in Young Generation (Short-lived objects):
Eden: When Eden fills up, a Minor GC is initiated. Minor GC is a fast, partial cleanup of RAM in Java that only affects the area called Young Generation.
Copying Algorithm: Live objects (reachable from GC Roots) are copied from Eden and one of the Survivor Spaces (S0 or S1) to the second Survivor Space.
Survivor Spaces (S0/S1, or From/To): Two identically sized spaces, always one is empty. (If both Survivors contained data: There would be nowhere to copy new live objects from Eden) After each Minor GC, live objects are copied between them, and their age (
age) is incremented. This space filters out short-lived objects with minimal overhead.Promotion: Upon reaching the age threshold (
MaxTenuringThreshold, usually 15), an object is considered long-lived and is moved (promoted) to Old Generation. (simplification)
Maturity in Old Generation (Long-lived objects):
- Objects survive for a long time.
- Filling up Old Gen (or reaching a certain threshold,
InitiatingHeapOccupancyPercent) triggers a Major GC (or Full GC, depending on the collector), which works with the entire heap. - Algorithms in Old Gen are more complex: Mark-Sweep-Compact (Serial, Parallel), Concurrent Mark-Sweep (CMS), or mixed ones, as in G1/ZGC/Shenandoah.
Death and recycling (Garbage Collection):
- An object becomes garbage when there is not a single reference from a live object (GC Root) via any reachability path.
- GC Roots: Static variables, active Stack Frames, JNI References, loaded system classes.
- Memory is freed by the collector. In Eden/Survivor — by copying live objects (dead ones are ignored). In Old Gen — by "sweeping" and subsequent "compaction" to combat fragmentation.
2. Garbage Collection: Basic algorithms and trade-offs
- Garbage Collection is an automated dynamic memory management system that frees objects unreachable by the executing program.
Algorithms:
Mark-Sweep:
- Phase 1 (Mark): Traversing the reachability graph from GC Roots. Live objects are marked.
- Phase 2 (Sweep): Linear pass through the entire memory. Unmarked (dead) blocks are marked as free.
- Trade-offs: Causes fragmentation. Low overhead, but leads to a "holey" heap, degrading allocation performance and potentially causing OOM due to lack of contiguous space.
Copying:
- Divides memory into two semi-spaces (
FromandTo). - Live objects are copied from
FromtoTo. After copying, the entireFromspace is considered free. - Trade-offs: Requires 2x more memory (half is always empty). Does not fragment memory. Extremely efficient if most objects die young. Used only in Young Generation.
- Divides memory into two semi-spaces (
Mark-Compact:
- Phase 1 (Mark): Same as Mark-Sweep.
- Phase 2 (Compact): Live objects are moved to the beginning of the region, forming a contiguous block of memory. All references to moved objects are updated.
- Trade-offs: Eliminates fragmentation. The most expensive operation due to the cost of moving and updating references. Used primarily in Old Generation.
Evolutionary conclusion: Young Gen uses Copying (high mortality, efficiency). Old Gen uses hybrids of Mark-Sweep/Compact (low mortality, combating fragmentation). Modern GCs (G1, ZGC) divide the heap into regions, applying algorithms precisely.
3. Garbage Collectors: Strategic Choice
- Serial GC (
-XX:+UseSerialGC): Single-threaded, for Mark, Sweep, Compact. STW only. Scenario: Single-threaded applications, microcontrollers, environments with minimal resources. - Parallel GC (Throughput Collector) (
-XX:+UseParallelGC): Multi-threaded versions of Serial for Young and Old Gen. Maximizes throughput at the cost of more aggressive CPU usage and longer STW pauses. Scenario: Batch processing, computations, where pauses of hundreds of milliseconds to seconds are acceptable. - CMS – Concurrent Mark Sweep (
-XX:+UseConcMarkSweepGC): Reduces STW pause duration by having the collector work concurrently with the application.- Phases:
Initial Mark(STW, fast),Concurrent Mark,Concurrent Preclean,Remark(STW),Concurrent Sweep. - Trade-offs: Does not perform compaction by default → fragmentation, possible
Concurrent Mode Failure(forced Full GC). High CPU consumption in background.
- Phases:
- G1 – Garbage First (
-XX:+UseG1GC, default from Java 9-11): Regional (-XX:G1HeapRegionSize), predictive.- Divides the heap into ~2000 regions. Collects regions with the most garbage first (
Garbage First). Has soft real-time goals (-XX:MaxGCPauseMillis). - Scenario: Universal balance between throughput and latency. The main choice for most applications with heap >4-6GB.
- Divides the heap into ~2000 regions. Collects regions with the most garbage first (
- ZGC (
-XX:+UseZGC) and Shenandoah (-XX:+UseShenandoahGC): Low-latency (sub-millisecondgoals) collectors.- Key feature: Almost all phases, including object relocation, are performed concurrently with the application.
- Use colored pointers and read/write barriers (
load barriers). - Scenario: Latency-critical applications: financial transactions, high-load web services, large heaps (terabytes).
Selection strategy: The lower the acceptable latency, the more advanced and concurrent a collector is required. Throughput -> Latency gradient: Parallel -> G1 -> ZGC/Shenandoah.
4. Stop-The-World (STW): Anatomy of a Freeze
- STW — a phase when all application threads are suspended to perform a GC operation safe against a changing object graph.
- Causes: Root scanning (
Root Scanning), theRemarkphase in CMS/G1 (accounting for changes during concurrent marking), evacuation and compaction in non-concurrent phases. - GC Impact:
- Serial/Parallel: Dominant, long STW phases. Pauses grow with heap size.
- CMS: Significantly reduces STW (
Initial Mark,Remark), but leaves the risk ofConcurrent Mode Failure(long STW). - G1: Predictable, manageable pauses (
MaxGCPauseMillis). STW is limited to evacuating a selected set of regions. - ZGC/Shenandoah: STW is reduced to microsecond root scanning (
Root Scanning). Most of the work is concurrent.
5. Memory Leak in Java: Systematic Failure
- Memory leak — a situation where objects are no longer used by the application but cannot be collected by GC due to remaining incorrect references stored in live data structures.
- This is not a JVM bug, but a logical error in the code.
Canonical examples:
- Static collections (Classic):
public class LeakyClass { private static final List<byte[]> STATIC_CACHE = new ArrayList<>(); public void processData(byte[] data) { STATIC_CACHE.add(data); // The data object is forever reachable via the static field } } - Uncontrolled caches (Guava Cache, Caffeine without eviction policy):
Cache<Key, Value> cache = Caffeine.newBuilder().build(); // No expireAfterWrite or maximumSize // Cache grows indefinitely. - Unclosed resources (
InputStream,Connection,Session): Resources often hold references to internal buffers or objects in native memory. Solution:try-with-resources. - Event listeners (Listeners) and inner classes: Not unsubscribing from a listener stored in a global context keeps a reference to the outer class.
ThreadLocalwithout cleanup (especially in thread pools): The value inThreadLocallives as long as the thread lives. In web applications, a thread returns to the pool and lives for years.private static final ThreadLocal<HeavyContext> threadLocal = new ThreadLocal<>(); // After use, it is necessary to: threadLocal.remove();
Diagnosis: Monitoring Old Gen (constant growth), analyzing heap dump (jmap -dump, MAT, VisualVM), searching for java.lang.Object[] with the largest retained size.
6. Metaspace vs PermGen: Evolution of Metadata
PermGen (up to Java 7) — a fixed heap segment for class metadata, causing frequent OutOfMemoryError and requiring manual size tuning.
Metaspace (since Java 8) — a dynamic area in native memory, automatically managed by the OS, eliminating PermGen problems and allowing efficient loading and unloading of classes.
- PermGen (≤ Java 7): Fixed size (
-XX:MaxPermSize). Stored class metadata, interned strings, static members. Frequent cause ofOutOfMemoryError: PermGen space. - Metaspace (Java 8+): Native memory (not part of Java Heap).
- Managed by the OS, unlimited by default (limited by physical memory/swap).
- Automatic growth and cleanup. Class-loaders and their loaded classes are collected by GC.
- Divided:
Klass Metaspace(non-droppable metadata),NoKlass Metaspacefor other things.
OutOfMemoryError: Metaspaceoccurs when:- The limit is reached (
-XX:MaxMetaspaceSize). - Metadata leak (ClassLoader Leak): A common cause — containers (Tomcat, OSGi) where applications are reloaded, but the old ClassLoader is held (e.g., via a thread or static reference), preventing its classes from being unloaded.
- The limit is reached (
7. String Pool (String Table): Deduplication Mechanism
- String Pool — a hash table (
Hashtable) in the heap (previously in PermGen), storing canonical (interned) instances ofString. - Rules:
- String literals (
"text") are added to the Pool during class loading. String.intern(): Allows adding a string created at runtime to the Pool. Returns the canonical representation.- If the string is already in the Pool — returns a reference to it.
- If not — adds the current object to the Pool and returns it.
- String literals (
- When to use
intern():- Almost never in typical application code.
- Justified: When processing huge volumes of data with a high degree of string duplication (parsing CSV, tags, enum-like values), when it is required:
- Significant memory savings (one string for many identical values).
- Accelerated comparison via
==(replacing.equals()).
- Danger: Uncontrolled use leads to growth of the Pool, which is never cleared (before Java 7). Since Java 7+, interned strings reside in the heap and can be collected by GC if the ClassLoader is unloaded.
8. Escape Analysis: Compiler Magic for Optimization
- Escape Analysis (EA) — JIT compiler (C2) analysis determining the visibility scope of a created object.
- NoEscape: The object does not leave the method and/or thread bounds.
- ArgEscape: The object is passed to another method but does not "escape" the thread.
- GlobalEscape: The object is published (saved to a static field, passed to another thread).
- Based on EA, JIT applies optimizations:
- Scalar Replacement: If an object is
NoEscape, JIT does not allocate it on the heap. Instead, its fields are transformed into local variables of the method (primitives/references) on the stack. Ideal optimization: zero allocation overhead, zero GC overhead.// Before optimization Point p = new Point(x, y); return p.x + p.y; // After Scalar Replacement int p_x = x, p_y = y; return p_x + p_y; // Point object is not created. - Stack Allocation: A special case of Scalar Replacement. Theoretical allocation on the stack, but in HotSpot it is implemented precisely as decomposition.
- Lock Elision: If the monitor of an object is
NoEscape(e.g., a synchronized block on a local object), the lock is removed, as it cannot be contended in another thread.
- Scalar Replacement: If an object is
Activation: Enabled by default (-XX:+DoEscapeAnalysis). Effective for short-lived, local objects (DTOs, iterators, builders).
9. Thread Memory (Stack Memory): Frame Architecture
Each JVM thread has a private stack, created when it starts. The stack consists of stack frames, pushed on method call and popped on its completion (normal or exceptional).
Structure of a method frame:
- Local Variable Array (LVA): Array of method variables, indexed from 0.
this(for non-static methods) is stored inLVA[0].- Method parameters — in
LVA[1],LVA[2], ... - Local variables — in subsequent slots.
- Each slot is 32 bits (
int,float,reference).long/doubleoccupy 2 slots.
- Operand Stack (OS): Working area for computations (stack-architecture style). Bytecode instructions (
iload,iadd,invokevirtual) operate on this stack (push/pop values).int a = 5; int b = 3; int c = a + b; // Bytecode: iconst_5 // push 5 -> OS istore_1 // pop OS -> LVA[1] (a) iconst_3 // push 3 -> OS istore_2 // pop OS -> LVA[2] (b) iload_1 // push LVA[1] (a) -> OS iload_2 // push LVA[2] (b) -> OS iadd // pop 2 values, add, push result -> OS istore_3 // pop OS -> LVA[3] (c) - Reference to Runtime Constant Pool (RCP): Pointer to the class's Constant Pool, needed for resolving symbolic references (method names, classes, constants) at runtime.
Size: Set by the -Xss parameter (default ~1MB). Overflow → StackOverflowError. Dynamic expansion → OutOfMemoryError.
10. JIT Compilation: C1, C2, and Adaptive Optimization
JIT (Just-In-Time) — compilation of "hot" bytecode into native machine code at runtime.
Compilation levels in HotSpot (Tiered Compilation,
-XX:+TieredCompilation):- Interpreter: Executes bytecode. Zero startup overhead, but low speed.
- C1 (Client Compiler): Fast, lightweight compilation. Applies basic optimizations (inlining, simple data flow analysis). Goal — quickly get working native code.
- C2 (Server Compiler): Aggressive, heavy optimizing compiler. Uses complex static analysis (EA, scalar replacement, loop unrolling, macro- and micro-fusion, memory and barrier optimizations). Compiles the hottest methods.
Profiling: JVM collects data about code operation at runtime:
- Method invocation counters.
- Branching: Which
ifbranch executes more often. - Type Profile: Which concrete classes arrive at a polymorphic call (
invokevirtual). This allows devirtualization — replacing a virtual call with a direct one, and then inlining.
Deoptimization: The reverse process. If the optimizer's assumptions are violated (e.g., a new type arrives, not accounted for in the profile), JVM rolls back the compiled native code back to interpreted bytecode.
- Triggers: "Stale" profile (class loading, new polymorphic types), debug points (breakpoint), dependency reset. (simplification)
Cycle: Interpreter → profiling → C1 → profiling → C2 → (deoptimization if necessary). This is Adaptive Optimization.
11. volatile: Guarantees of Visibility and Ordering
volatile— a variable modifier providing guarantees of visibility and ordering at the memory level, without atomicity for compound operations (i++).Semantics:
- Visibility: A write to a
volatilevariable by one thread is guaranteed to become visible to all subsequent reads of that variable from other threads. - Prevention of Reordering: JVM and processor cannot reorder read/write operations of a
volatilevariable with other memory operations in a way that violates the happens-before rule.
- Visibility: A write to a
Happens-Before: The formal Java memory model defining guarantees of visibility of changes between threads.
- Rule for
volatile(JLS 17.4.5): A write to avolatilefield happens-before every subsequent read of the same field. - Consequence (Transitivity): If thread A writes to
volatile V, and then thread B readsV, then all memory changes made by thread A before writing toVbecome visible to thread B after readingV.// Thread 1 sharedNonVolatileData = ...; // (1) volatileFlag = true; // (2) volatile write // Thread 2 if (volatileFlag) { // (3) volatile read (will see true) // Here, the value of sharedNonVolatileData from (1) is guaranteed to be visible use(sharedNonVolatileData); }
- Rule for
Implementation: At the processor level, this is usually implemented via memory barriers (
Memory BarrierorFence). WritingvolatileincludesStoreStore+StoreLoadbarriers. Reading —LoadLoad+LoadStore.
Usage: For completion flags, publishing results (safe publication), in patterns like double-checked locking (with volatile).
12. False Sharing: The Hidden Performance Enemy
- False Sharing — performance degradation in multi-threaded systems, occurring when two independent frequently modified fields (
M1andM2), belonging to different objects (or different array elements), fall into the same cache line (cache line, usually 64 bytes) of the processor. - Mechanism: Processors maintain cache coherency via the MESI protocol. If a thread on core 1 modifies
M1, the entire cache line is marked as "modified" (Modified), invalidating the same cache line on core 2, even if it only containsM2. Core 2, when accessingM2, is forced to re-read the line from memory, even though the valueM2itself hasn't changed. This causes cascading invalidation and a "race" for the cache line. - Consequence: Seemingly independent operations start competing synchronously, causing a sharp drop in scalability.
Solution — Alignment (Padding, @Contended):
Classic padding (pre-Java 8): Adding "empty" fields to separate critical fields into different cache lines.
class Counter { volatile long count1; private long p1, p2, p3, p4, p5, p6, p7; // Padding ~56 bytes volatile long count2; }@sun.misc.Contended(Java 8+): Annotation instructing JVM to automatically add padding around a field or the entire class.import jdk.internal.vm.annotation.Contended; public class StripedCounter { @Contended // JVM will add padding (~128 bytes) around each field volatile long cell1; @Contended volatile long cell2; }- Requires
-XX:-RestrictContendedfor use outsidejava.base. - Widely used in JDK internals (
LongAdder,Thread,ForkJoinPool).
- Requires
- Alternatives: Designing data structures so that threads work with independent memory areas (local variables,
ThreadLocal), or using thread-local structures likeLongAdder.
Diagnosis: Profilers (VTune, perf) can track events like RESOURCE_STALLS.L1D_MISS_CYCLES or MEM_LOAD_RETIRED.L2_MISS. In Java — empirically, by performance degradation when adding seemingly independent operations.
Once again, and perhaps a bit more clearly ->
PART 1: JVM MEMORY ARCHITECTURE - MACRO LEVEL
Heap: The Dominant Structure in JVM
Physical organization (64-bit HotSpot JVM):
┌─────────────────────────────────────────────────────────────┐
│ HEAP (Max: 32/64 TB) │
├──────────────┬─────────────────┬────────────────────────────┤
│ YOUNG GEN │ │ OLD GEN │
│ (1-3 regions) │ │ (2/3 of heap) │
├──────────────┼─────────────────┼────────────────────────────┤
│ EDEN │ SURVIVOR S0 │ │
│ (80% YG) │ SURVIVOR S1 │ Long-lived │
│ │ (10% YG each) │ objects, survived │
│ │ │ many GCs │
├──────────────┴─────────────────┴────────────────────────────┤
│ METASPACE │
│ (Class metadata, methods, constants, annotations) │
└─────────────────────────────────────────────────────────────┘
Quantitative parameters (default):
-Xms/-Xmx: Initial/Maximum heap size-XX:NewRatio=2: OldGen:YoungGen = 2:1-XX:SurvivorRatio=8: Eden:Survivor = 8:1 (each Survivor)-XX:MaxTenuringThreshold=15: Maximum age for promotion
Object Lifecycle: Detailed Chronology
Phase 1: Allocation in Eden
public class AllocationPatterns {
// TLAB (Thread-Local Allocation Buffer) - key optimization
static void demonstrateTLAB() {
// When creating an object:
// 1. Check: is there enough space in the current TLAB?
// 2. If yes: pointer bump allocation (pointer += size)
// 3. If no: request a new TLAB from Eden
// TLAB size is configurable:
// -XX:TLABSize=512k (size)
// -XX:+ResizeTLAB (automatic resize)
for (int i = 0; i < 100_000; i++) {
// 99% of objects are allocated here
Object obj = new Object(); // ~12 bytes + overhead
}
}
}
Allocation mechanics:
- Pointer Bump in TLAB:
current_ptr += object_size - Zeroing memory: JVM zeroes memory for safety
- Setting Mark Word:
mark = hash/age/lock_bits - Setting Klass Pointer: reference to object's
Class
Cost: 10-20 CPU cycles for a small object
Phase 2: First Minor GC
Trigger: Eden is 80-90% full (adaptive)
Copying Collector Algorithm:
// HotSpot pseudo-code (Young GC)
void youngGC() {
// 1. Stop-The-World: suspend all threads
stop_all_threads();
// 2. Root scanning (very fast)
scan_roots();
// 3. Copy live objects from Eden and From-Survivor to To-Survivor
for (Object obj : Eden + From_Survivor) {
if (is_alive(obj)) {
new_location = copy_to(obj, To_Survivor);
forward_pointer(obj, new_location); // To update references
}
}
// 4. Swap Survivor spaces
swap_survivors();
// 5. Age objects in Survivor
for (Object obj in To_Survivor) {
obj.age++;
if (obj.age >= threshold) {
promote_to_old_gen(obj);
}
}
// 6. Resume
resume_all_threads();
}
Critical details:
- Card Table: Bitmap for tracking references from OldGen to YoungGen
- Remembered Sets: In G1/ZGC for tracking inter-region references
Phase 3: Promotion to Old Generation
Promotion conditions:
- Age threshold:
age >= MaxTenuringThreshold(usually 15) - Survivor size: If Survivor overflows, oldest objects are promoted
- Large objects: >
-XX:PretenureSizeThreshold(usually 1MB) go directly to OldGen
// Example: creating long-lived objects
static void createLongLivedObjects() {
List<byte[]> longLived = new ArrayList<>();
// These objects will survive several Minor GCs
for (int i = 0; i < 100; i++) {
// 100KB - enough for promotion after several GCs
byte[] data = new byte[102400];
longLived.add(data);
// Create garbage to provoke GC
for (int j = 0; j < 1000; j++) {
byte[] garbage = new byte[1024]; // Will be collected
}
}
}
Garbage Collector Models: Evolution of Algorithms
1. Serial Collector (Mark-Sweep-Compact)
Algorithm:
1. Mark: Traverse reachability graph from GC Roots
2. Sweep: Free unmarked areas
3. Compact: Defragmentation (optional)
Features:
- Single-threaded (STW for the entire time)
- Simple, low overhead
- Ideal for embedded and client applications
2. Parallel / Throughput Collector
Algorithm:
- Multi-threaded versions of Serial for all phases
- Goal: maximize throughput (application/GC)
Configuration:
-XX:+UseParallelGC
-XX:ParallelGCThreads=(CPU cores)
-XX:MaxGCPauseMillis=200 (target)
-XX:GCTimeRatio=99 (99% time for application)
Usage: batch processing, ETL, scientific computing
3. CMS - Concurrent Mark Sweep (deprecated)
// CMS phases:
1. Initial Mark (STW) // Fast, only direct roots
2. Concurrent Mark // Concurrent with application
3. Remark (STW) // Account for changes during concurrent mark
4. Concurrent Sweep // Cleanup
Problems:
- Fragmentation (no compaction)
- Concurrent Mode Failure on rapid filling
- High CPU usage in concurrent phases
4. G1 - Garbage First (default since Java 9)
Architecture:
- Heap divided into ~2000 regions (1-32MB)
- Young generation = set of regions (not fixed)
- Humongous regions for objects >50% region
Algorithm:
1. Concurrent marking (like CMS)
2. Evacuation: copying live objects from "garbage first" regions
3. Compaction on-the-fly
Configuration:
-XX:+UseG1GC
-XX:G1HeapRegionSize={1,2,4,8,16,32}M
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=45
5. ZGC / Shenandoah (Low-Latency)
Innovations:
- Load barriers instead of write barriers
- Colored pointers (metadata in pointers)
- Region-based like G1, but all phases concurrent
ZGC pointer structure:
┌─────────┬──────┬──────┬──────────────────────┐
│ 42 bits │ 4b │ 4b │ 14b │
│ Address │ 0000 │ Mark │ Unused │
└─────────┴──────┴──────┴──────────────────────┘
Advantages:
- STW < 1ms regardless of heap size
- Support for terabyte heaps
PART 2: STOP-THE-WORLD - ARCHITECTURAL VIEW
Anatomy of a JVM Pause
// HotSpot VM safepoint operation
void SafepointSynchronize::begin() {
// 1. Set safepoint flag
_state = _synchronizing;
// 2. Stop all threads at safe points
for (JavaThread* thread = Threads::first(); thread; thread = thread->next()) {
thread->safepoint_state()->examine_state_of_thread();
// Thread must stop in one of:
// - Between bytecode instructions (in interpreted)
// - At safepoint polling page (in compiled code)
// - Blocked in native code
}
// 3. All threads stopped
_state = _synchronized;
// 4. Perform operation (GC, deopt, etc.)
do_operation();
// 5. Resume
_state = _not_synchronized;
}
Safepoint Polling in Compiled Code
; x86_64 generated JIT code
compiled_method:
; Prologue
push rbp
mov rbp, rsp
; Method body
mov rax, [rsi+0x10] ; Load field
add rax, 0x1
mov [rsi+0x10], rax ; Store
; Safepoint poll (every ~1000 instructions)
test byte ptr [rip+safepoint_page], 0xff
jnz safepoint_handler ; Jump if safepoint
; Continue
ret
safepoint_page: ; Memory page, changed on safepoint
.byte 0
PART 3: MEMORY LEAK - SYSTEMATIC ANALYSIS
Memory Leak Typology
1. Classic leak via statics
public class ClassicLeak {
// Global cache without limits
private static final Map<Key, Value> CACHE = new HashMap<>();
// Leak: objects never removed
public void processRequest(Request req) {
Key key = extractKey(req);
Value val = computeExpensiveValue(req);
CACHE.put(key, val); // Forever in memory
}
// Solution 1: WeakHashMap
private static final Map<Key, Value> WEAK_CACHE =
Collections.synchronizedMap(new WeakHashMap<>());
// Solution 2: Guava Cache with policies
private static final Cache<Key, Value> GUAVA_CACHE =
CacheBuilder.newBuilder()
.maximumSize(10000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.weakKeys()
.build();
}
2. ThreadLocal in thread pool
public class ThreadLocalLeak {
private static final ThreadLocal<ByteBuffer> BUFFER_HOLDER =
new ThreadLocal<ByteBuffer>() {
@Override
protected ByteBuffer initialValue() {
return ByteBuffer.allocateDirect(1024 * 1024); // 1MB direct buffer
}
};
// In web application (Tomcat):
// Thread returns to pool after request
// ThreadLocal is not automatically cleaned!
// Memory accumulates: pool_size * buffer_size
public void handleRequest(HttpServletRequest req) {
ByteBuffer buffer = BUFFER_HOLDER.get();
// use...
// FORGET: BUFFER_HOLDER.remove();
}
}
3. Incorrect event listeners
public class ListenerLeak {
private final List<EventListener> listeners = new CopyOnWriteArrayList<>();
public void registerListener(EventListener listener) {
listeners.add(listener);
}
// NO unregisterListener method!
// Listener holds reference to outer object
// → leak of entire reference chain
}
4. JNI/Off-Heap leaks
public class NativeMemoryLeak {
static {
System.loadLibrary("native");
}
private native long allocateNativeMemory(int size);
private native void freeNativeMemory(long pointer);
public void leak() {
long ptr = allocateNativeMemory(1024 * 1024); // 1MB native
// Forget to call freeNativeMemory(ptr)
// → leak in native heap (not visible in Java heap dump!)
}
}
Leak Diagnostics:
# 1. Real-time monitoring
jstat -gc <pid> 1s # Check OldGen growth after Full GC
# 2. Taking heap dump (production with caution!)
jmap -dump:live,format=b,file=heap.hprof <pid>
# 3. Analysis in Eclipse MAT
# Key queries:
# - "Leak Suspects Report"
# - "Top Consumers"
# - "Histogram grouped by class"
# - "Path to GC Roots"
# 4. Command line analysis
jmap -histo:live <pid> | head -20 # Largest classes
# 5. JFR (Java Flight Recorder) for dynamic analysis
jcmd <pid> JFR.start duration=60s filename=leak.jfr
PART 4: METASPACE - CLASS METADATA
Evolution from PermGen to Metaspace
PermGen (≤ Java 7):
┌─────────────────────────────────┐
│ PERMGEN │
│ (Fixed size, part of Heap) │
├─────────────────────────────────┤
│ • Class metadata │
│ • Bytecode │
│ • Runtime constant pool │
│ • String intern table │
│ • JIT code cache (partially) │
└─────────────────────────────────┘
Problems: OOM, manual size tuning, inefficient GC
Metaspace (Java 8+):
┌─────────────────────────────────┐
│ NATIVE MEMORY │
│ (Not Heap, managed by OS) │
├─────────────────────────────────┤
│ METASPACE │
│ ┌─────────────────────────┐ │
│ │ Non-Class Metaspace │ │
│ │ ┌───────────────────┐ │ │
│ │ │ Chunk (2MB) │ │ │
│ │ │ • Constant Pool │ │ │
│ │ │ • Annotations │ │ │
│ │ │ • Methods │ │ │
│ │ └───────────────────┘ │ │
│ │ ... │ │
│ └─────────────────────────┘ │
│ │
│ ┌─────────────────────────┐ │
│ │ Class Metaspace │ │
│ │ (Compressed Class │ │
│ │ Space, if enabled) │ │
│ │ • Klass structures │ │
│ │ • vtables │ │
│ │ • itables │ │
│ └─────────────────────────┘ │
└─────────────────────────────────┘
Metaspace Structure
// Simplified Metaspace structure in HotSpot
class Metaspace {
// Arena-based allocator
Metachunk* _chunks; // List of chunks
// Statistics
size_t _used_words;
size_t _capacity_words;
size_t _committed_words;
};
// Metadata chunk
class Metachunk {
// Header
size_t _word_size;
Metablock* _blocks;
// Type: Non-Class (methods, constants) or Class (Klass)
MetaspaceType _type;
};
ClassLoader Leak - Main Cause of OOM: Metaspace
public class ClassLoaderLeak {
// Web application, reloaded in Tomcat
public void leak() throws Exception {
while (true) {
// 1. Create isolated ClassLoader
URLClassLoader loader = new URLClassLoader(
new URL[]{new URL("file:///app.jar")},
null // Parent = null (isolation)
);
// 2. Load class
Class<?> clazz = loader.loadClass("com.example.SomeClass");
Object instance = clazz.newInstance();
// 3. Store reference somewhere global
GlobalCache.store(instance); // LEAK!
// 4. ClassLoader cannot be unloaded,
// because its classes are reachable through instance
// → Metaspace grows with each reload
}
}
}
ClassLoader leak diagnostics:
# 1. Check number of ClassLoaders
jcmd <pid> VM.classloader_stats
# 2. Dump classes
jmap -clstats <pid>
# 3. Enable class loading logging
-XX:+TraceClassLoading -XX:+TraceClassUnloading
# 4. Limit Metaspace
-XX:MaxMetaspaceSize=256m
-XX:MetaspaceSize=64m
PART 5: STRING POOL AND INTERNING
String Pool: Hash Table in Heap
// Internal String Pool implementation (StringTable)
class StringTable {
// Hash table with separate chaining
private static Entry[] table;
static class Entry {
final String str;
final int hash;
Entry next;
}
// Main intern() method
static String intern(String str) {
int hash = hashString(str);
int index = hash & (table.length - 1);
for (Entry e = table[index]; e != null; e = e.next) {
if (e.hash == hash && str.equals(e.str)) {
return e.str; // Existing string
}
}
// Adding new string
Entry newEntry = new Entry(str, hash, table[index]);
table[index] = newEntry;
return str;
}
}
String Pool Evolution
Java 6 and earlier: In PermGen, fixed size, not cleared
-XX:StringTableSize=1009 # Small and fixed
Java 7+: In Heap, dynamic size, cleared by GC
-XX:StringTableSize=60013 # Size can be configured
When to use intern()?
Anti-pattern:
// NEVER DO THIS
public void processLine(String line) {
String interned = line.intern(); // All strings in pool!
// Pool fills up, GC won't help
}
Possibly correct usage:
public class TokenProcessor {
// Limited set of known tokens
private static final Set<String> KNOWN_TOKENS =
Set.of("GET", "POST", "PUT", "DELETE", "HEAD").stream()
.map(String::intern)
.collect(Collectors.toSet());
// Frequently used enum-like values
public void process(HttpMethod method) {
String m = method.name().intern(); // Only 6 possible values
// Fast comparison via ==
if (m == "GET") { // SAFE: "GET" guaranteed interned
// ...
}
}
}
CSV parser optimization:
public class CSVParser {
private final Map<String, String> pool = new HashMap<>();
public String internIfFrequent(String value) {
// Strategy: intern only frequently repeating values
if (value.length() > 10) return value; // Long strings not interned
String cached = pool.get(value);
if (cached != null) return cached;
// Add only if occurs frequently
if (shouldIntern(value)) {
String interned = value.intern();
pool.put(value, interned);
return interned;
}
return value;
}
}
PART 6: JIT COMPILATION - C1, C2, ADAPTIVE OPTIMIZATIONS
Three-Tier Compilation (Tiered Compilation)
┌─────────────────────────────────────────────────┐
│ INTERPRETER (Level 0) │
│ • Zero startup overhead │
│ • Slow execution │
│ • Profile collection: counters, types, branches│
└─────────────────┬───────────────────────────────┘
↓ (1000+ method calls)
┌─────────────────────────────────────────────────┐
│ C1 (CLIENT) COMPILER │
│ • Fast compilation (level 1 optimizations) │
│ • Inlining small methods │
│ • Local optimizations │
│ • Continue profile collection │
└─────────────────┬───────────────────────────────┘
↓ (10000+ method calls)
┌─────────────────────────────────────────────────┐
│ C2 (SERVER) COMPILER │
│ • Aggressive optimizations (level 4) │
│ • Global data flow analysis │
│ • Escape Analysis and Scalar Replacement │
│ • Devirtualization and inlining │
│ • Vectorization (Auto-Vectorization) │
└─────────────────────────────────────────────────┘
Compilation Configuration
# Compilation levels (0-4)
-XX:CompileThreshold=10000 # Threshold for C2
-XX:Tier3InvocationThreshold=2000 # For C1->C2
-XX:Tier4InvocationThreshold=15000
# Cache sizes
-XX:ReservedCodeCacheSize=240m # Native code cache
-XX:InitialCodeCacheSize=160m
# Compiler control
-XX:+TieredCompilation # Enable multi-tier (default)
-XX:-TieredCompilation # Only C2 (slower start)
-XX:CompileCommand=exclude,com/example/expensiveMethod
Profiling and Devirtualization
public class DevirtualizationExample {
interface Shape {
double area();
}
class Circle implements Shape {
private final double radius;
public double area() { return Math.PI * radius * radius; }
}
class Square implements Shape {
private final double side;
public double area() { return side * side; }
}
public double totalArea(List<Shape> shapes) {
double total = 0;
for (Shape shape : shapes) {
total += shape.area(); // Virtual call
}
return total;
}
}
Optimization process:
- Interpreter: Collects type profile
Shape#area(): 95% Circle, 5% Square
- C1 compiler: Adds type check
if (shape.getClass() == Circle.class) { total += ((Circle)shape).area(); // Direct call } else { total += shape.area(); // Virtual call } - C2 compiler: If profile is stable
- Creates two specialized loop versions
- For Circle: completely removes checks
- For Square: separate rare path
Escape Analysis and Scalar Replacement
public class Point {
private final int x, y;
public Point(int x, int y) { this.x = x; this.y = y; }
public int getX() { return x; }
public int getY() { return y; }
}
public int compute() {
Point p = new Point(10, 20); // NoEscape: doesn't leave method
return p.getX() + p.getY();
}
// After Scalar Replacement:
public int compute_optimized() {
// Point object not created!
int p_x = 10; // Field decomposed into local variable
int p_y = 20; // Second field decomposed
return p_x + p_y;
}
Application conditions:
- NoEscape: Object not passed outside method
- ArgEscape: Passed, but not published
- GlobalEscape: Published (not optimized)
Enable/Disable:
-XX:+DoEscapeAnalysis # Enable (default)
-XX:+EliminateAllocations # Scalar Replacement (default)
-XX:+PrintEscapeAnalysis # Logging
PART 7: VOLATILE AND JAVA MEMORY MODEL
Java Memory Model (JMM)
Happens-before rules:
- Program order: Actions in a thread happen in program order
- Monitor lock: Releasing a monitor happens-before subsequent acquisition
- Volatile: Write to volatile happens-before read of same field
- Thread start:
Thread.start()happens-before any actions in the thread - Thread join: All actions in a thread happen-before
Thread.join() - Transitivity: If A happens-before B and B happens-before C, then A happens-before C
Volatile Implementation at Processor Level
public class VolatileExample {
private volatile boolean flag = false;
private int data = 0;
public void writer() {
data = 42; // (1) Normal write
flag = true; // (2) Volatile write
}
public void reader() {
if (flag) { // (3) Volatile read
System.out.println(data); // (4) Will see 42
}
}
}
Memory barriers for x86:
; writer()
mov [data], 42 ; Store data
; StoreStore barrier (x86 doesn't require)
mov [flag], 1 ; Store flag (volatile)
sfence ; StoreLoad barrier (x86 requires)
; reader()
lfence ; LoadLoad barrier (x86 requires)
mov rax, [flag] ; Load flag (volatile)
test rax, rax
jz .done
; LoadStore barrier (x86 doesn't require)
mov rbx, [data] ; Load data
False Sharing and @Contended
False sharing problem:
public class FalseSharing {
// Two fields in one cache line (64 bytes)
volatile long value1; // [0-7]
// ... 56 bytes ...
volatile long value2; // [56-63]
// Thread 1: constantly writes to value1
// Thread 2: constantly reads value2
// RESULT: cache line constantly invalidated
// → performance drops significantly
}
Solution with @Contended:
public class PaddedData {
// JVM will add 128 bytes padding on each side
@Contended
volatile long value1;
@Contended
volatile long value2;
// Memory layout:
// [value1][128 bytes padding][... other fields ...][128 bytes padding][value2]
}
Manual solution (pre-Java 8):
public class ManualPadding {
volatile long value1;
// Explicit padding
long p1, p2, p3, p4, p5, p6, p7; // 56 bytes
volatile long value2;
long p8, p9, p10, p11, p12, p13, p14; // Another 56 bytes
}
False sharing diagnostics:
# Linux: perf for cache miss monitoring
perf stat -e cache-misses,cache-references java -jar app.jar
# JVM flags for @Contended
-XX:-RestrictContended # Allow use outside java.base
-XX:ContendedPaddingWidth=128 # Padding size (default 128)
PART 8: PROFILING AND OPTIMIZATION IN PRACTICE
Scenario: High-Load Service
Initial state:
- 100k RPS, 95th percentile 200ms, heap 8GB
- Frequent Full GC pauses 2-3 seconds
Step 1: Data collection:
# 1. JFR for pause analysis
jcmd <pid> JFR.start duration=60s filename=gc.jfr
# 2. Detailed GC logs
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc.log
# 3. Heap dump just before Full GC
-XX:+HeapDumpBeforeFullGC -XX:HeapDumpPath=/path/to/dumps
Step 2: Analysis:
// Typical problems:
// 1. Too large Young/Old ratio
// 2. Frequent promotions due to large Survivor
// 3. Memory leak in caches
// 4. Too aggressive allocation rate
Step 3: Optimization:
# Switch to G1 GC
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
-XX:InitiatingHeapOccupancyPercent=35 # Start concurrent cycle earlier
# Tune Young Gen
-XX:NewRatio=1 # More Young for short-lived
-XX:SurvivorRatio=6 # More Eden
-XX:MaxTenuringThreshold=5 # Faster promotion for medium-lived
# Monitoring
-XX:+PrintAdaptiveSizePolicy # How JVM tunes sizes
-XX:+PrintTenuringDistribution # Age distribution
Anti-patterns and Their Fixes
Anti-pattern 1: Manual System.gc()
// BAD
public void processBatch() {
// ...
System.gc(); // Full GC pause at unpredictable moment
// ...
}
// Solution: rely on JVM or use
// -XX:+ExplicitGCInvokesConcurrent for G1
// -XX:+DisableExplicitGC in production
Anti-pattern 2: Large arrays in Young Gen
// BAD: 2MB array in Eden
byte[] buffer = new byte[2 * 1024 * 1024];
// Solution: direct allocator or tuning
-XX:PretenureSizeThreshold=3M # Objects >3MB directly to OldGen
Anti-pattern 3: String concat in loop
// BAD: O(n²) memory
String result = "";
for (String item : items) {
result += item; // New StringBuilder each time
}
// Solution:
StringBuilder sb = new StringBuilder(estimatedSize);
for (String item : items) {
sb.append(item);
}
String result = sb.toString();
PART 9: SPECIFIC CONFIGURATIONS FOR DIFFERENT SCENARIOS
Microservice (REST API, 4GB heap)
# G1 with aggressive latency goals
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50
-XX:G1HeapRegionSize=4M
-XX:InitiatingHeapOccupancyPercent=30
-XX:ConcGCThreads=2
-XX:ParallelGCThreads=4
# Metaspace limits
-XX:MaxMetaspaceSize=128M
-XX:MetaspaceSize=64M
# JIT settings
-XX:ReservedCodeCacheSize=128M
-XX:InitialCodeCacheSize=64M
Batch Data Processing (32GB heap)
# Throughput oriented
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:ParallelGCThreads=8
-XX:GCTimeRatio=99
-XX:MaxGCPauseMillis=500
# Large objects
-XX:PretenureSizeThreshold=10M
-XX:SurvivorRatio=10
# Monitoring
-XX:+PrintGCDetails
-XX:+PrintGCApplicationStoppedTime
Low-Latency System (Financial Transactions)
# ZGC for sub-millisecond pauses
-XX:+UseZGC
-XX:MaxGCPauseMillis=1
-XX:ConcGCThreads=4
-Xmx16g
-Xms16g # Fixed heap
# Disable bias locking for stability
-XX:-UseBiasedLocking
# Aggressive JIT compilation
-XX:-TieredCompilation # Only C2
-XX:CompileThreshold=1000
PART 10: MONITORING AND DIAGNOSTICS IN REAL TIME
Utilities and Their Purpose
- jcmd - universal command:
# Full list of available commands
jcmd <pid> help
# Heap dump
jcmd <pid> GC.heap_dump filename=heap.hprof
# Class status
jcmd <pid> GC.class_histogram
# JFR management
jcmd <pid> JFR.start duration=60s filename=recording.jfr
- jstat - GC statistics:
# Every second, 10 times
jstat -gc <pid> 1s 10
# Key metrics:
# S0C/S1C: Survivor capacity
# S0U/S1U: Survivor used
# EC/EU: Eden capacity/used
# OC/OU: Old capacity/used
# YGC/YGCT: Young GC count/time
# FGC/FGCT: Full GC count/time
- async-profiler - low-level profiler:
# CPU profiling
./profiler.sh -d 30 -f cpu.svg <pid>
# Allocation profiling
./profiler.sh -d 30 -e alloc -f alloc.svg <pid>
# Contended lock profiling
./profiler.sh -d 30 -e lock -f lock.svg <pid>
Configuring GC Logs for Analysis
# Detailed logs with timestamps
-Xlog:gc*,gc+age=trace,gc+heap=debug:file=gc.log:uptime,level,tags
# For G1 specifically
-Xlog:gc+g1*=debug,gc+phases=debug:file=g1.log
# Parsing logs with utilities
# 1. GCViewer: visualization
# 2. gceasy.io: online analysis
# 3. jClarity Censum: commercial tool