- Published on
What is a DAG? Arborescence (Part 2)
- Authors

- Name
- dima853
- @_dima853
Part 1: Arborescence (General Definition)
there may be inaccuracies or simplifications in this article
1.1. Intuitive Representation: Imagine an ordinary undirected tree. Now assign a special role to one of its vertices — the root. Then orient all edges so that they point outward from the root, i.e., in the direction from the root to the leaves. The resulting structure is an arborescence.
Simple Example - Computer Folder Structure ⬇️
- Root —
C:\(or/on Linux) - All folders are created via "New Folder"; there are no shortcuts or links.
- Each folder has exactly one "parent" (the folder it resides in).
- You can only go up to the parent. This is a classic tree (arborescence).
DAG with Potential Arborescence (real system with mklink or .lnk):
Let's say there is a structure:
C:\Projects\
├── SecretData\
└── CurrentProject\ <-- contains shortcut "Data" → C:\Projects\SecretData
Now this is a DAG, but not an arborescence, because:
SecretDatahas two "parents":- Direct:
C:\Projects\ - Via the shortcut:
C:\Projects\CurrentProject\Data\
- Direct:
- The condition
indegree = 1for an arborescence is violated. - But there are no cycles — it is a DAG.
"Arborescence(DAG)" in this analogy:
If the administrator strictly prohibits creating shortcuts, links, and mount points, then the file system is both a DAG (no cycles) and an Arborescence (a tree). This is precisely that simple case of a "simple folder tree".
1.2. Formal Definition:
explanation of notations at the end
Let G = (V, E) be a directed graph.
G is called an arborescence (oriented tree) with root r ∈ V if the following conditions are met:
- Existence of a root: There exists exactly one vertex
rinto which no edge enters (i.e., its indegreedeg⁻(r) = 0). This vertex is called the root. - Uniqueness of the path from the root: For any other vertex
v ∈ V \ {r}, there exists exactly one directed path fromrtov. - Absence of cycles: The graph
Gcontains no directed cycles.
1.3. Key Corollaries and Properties:
Important and equivalent characteristics follow from the definition:
- Structure of incoming edges: For each vertex v ≠ r, the indegree is 1 (deg⁻(v) = 1). For the root r, it is 0.
- The root is also a vertex.
In graph theory, the terms "vertex" and "node" mean the same thing. The root is simply a special, designated vertex.
- The indegree of the root is zero.
This means that no edge leads into the root — there is no arrow pointing to it. It is the starting point of all paths in the graph.
- All other vertices have an indegree of one.
Every vertex except the root has exactly one incoming edge — one arrow coming into it from another vertex.
- This is a direct consequence of the definition of an arborescence.
Precisely because the graph is an arborescence (directed tree), these rules hold: one root with no incoming edges, and every other vertex has exactly one "parent".
Thus, the structure of incoming edges in an arborescence is strictly defined:
0 → for the root, 1 → for all other vertices.
- Connectivity (in the directed sense): The graph is weakly connected (if edge directions are ignored, it is connected) and reachable from the root (any vertex v can be reached from r).
- Number of edges: If an arborescence has n vertices, then it has exactly n-1 edges.
- Orientation from ancestor to descendant: For any edge (u, v), vertex u is an ancestor of vertex v, and v is a descendant of u. The root is an ancestor of all vertices.
- Equivalence to a tree: If directed edges are replaced with undirected ones, the result is an ordinary connected acyclic undirected tree.
Part 2: Arborescence in a Directed Acyclic Graph (DAG)
Now consider a situation where an arborescence is a subgraph of a larger Directed Acyclic Graph (DAG).
2.1. Context and Motivation:
A DAG is a directed graph without cycles. Examples: task dependency (makefile), course order in a curriculum, partial ordering of events. In such a graph, a common task is to find a structure linking all or many vertices in the form of a tree.
2.2. Definition of Arborescence in a DAG:
Let G = (V, E) be a DAG. An arborescence in a DAG (often called a directed spanning tree) is a subgraph T = (V, E_T), where E_T ⊆ E, which itself is an arborescence (in the sense of Part 1) with some root r ∈ V. Spanning is the key word, meaning "covering all vertices".
Key point: All edges of the arborescence T are taken from the original DAG G and retain their orientation. The arborescence "fits" into the topological order of the DAG.
Topological order (or topological sorting) is a way to linearly order the vertices of a directed graph such that for any edge from vertex A to vertex B, vertex A always comes before vertex B in the list.
- Main condition: Works only in graphs without cycles (DAG). If it's possible to return from the end to the beginning, the order cannot be constructed.
- Essence: If there is a path from A to B, then in the list A will always be to the left of B. This is a "timeline" or task execution queue.
- Reachability vs. Order: If A→B, then A comes before B (always).
If A comes before B in the list, this does not mean there is a path between them (they might simply be independent parallel tasks).
Remember: Topological order is a to-do list where you will never encounter a situation where to perform the current step you need to do something that comes later in the list. (There are no cycles).
2.3. Existence and Construction:
In an arbitrary DAG, an arborescence covering all vertices (i.e., a spanning one) does not always exist.
Necessary conditions for the existence of a spanning arborescence in a DAG:
- Existence of a unique root-source: The DAG must have exactly one node with zero indegree (a source). This node becomes a candidate for the root r of the arborescence. If there are several such nodes, constructing a single spanning tree is impossible (several disconnected trees will result).
- Reachability from the root: This unique source r must be reachable from all other vertices of the DAG. This condition does not automatically follow from the first — there can be components isolated from r in the DAG, even if r is the only source in the entire graph.
Important note: Even if both conditions are met, not any choice of one incoming edge for each vertex will lead to a connected arborescence. A poor choice may create a forest (several trees) or violate reachability from the root.
Correct construction algorithm (traversal from the root): If the conditions are met, a spanning arborescence can be guaranteed by traversal (BFS/DFS) starting from the root r and selecting parent edges.
- Initialize an empty subgraph T.
- Select the unique source r as the root and add it to T.
- Start a traversal (e.g., BFS) from r along the edges of the original DAG G.
- For each new vertex v reached during traversal via edge (u, v) ∈ E, add this edge (u, v) to T as the sole incoming edge for v in the arborescence.
- After traversal is complete, check that T contains all vertices V. If yes, then T is the desired spanning arborescence. If some vertices were not reached, then despite the conditions being met, the choice of edges during traversal might have led to a dead end; in this case, a more complex algorithm for finding a spanning arborescence is required.
Alternative approach (parent selection): A simpler-to-implement method that guarantees the construction of an arborescence if it exists:
- Perform a topological sort of the DAG G.
- For each vertex v ≠ r (in topological order), choose as its parent any vertex u for which there exists an edge (u, v) ∈ E. Thanks to the topological order, u will be processed before v, which guarantees the absence of cycles and reachability from the root (provided the root r is the first vertex in the order). All selected edges (u, v) form the desired arborescence T.
By definition, every arborescence is a DAG (directed acyclic graph).
Part 3: Comparison and Summary Table
| Property | Arborescence (general) | Arborescence as a subgraph of a DAG | "Arborescence(Dag)" (as a property of a single graph) |
|---|---|---|---|
| Base Graph | It is a tree by itself. | It is a subgraph of a larger DAG. | The single graph considered possesses two properties. |
| Root | One, with deg⁻ = 0. | The root is the unique source in the DAG (if it's a spanning tree). | One, with deg⁻ = 0. |
| Indegree | deg⁻(v) = 1 for all v ≠ r. | deg⁻_T(v) = 1 in subgraph T, but in the original DAG deg⁻_G(v) can be >1. | deg⁻(v) = 1 for all v ≠ r. |
| Cycles | No directed cycles (follows from the definition). | None (in both T and G). | None (follows from both properties). |
| Main Idea | Oriented tree-like hierarchy. | Extracting a tree-like hierarchy from a dependency graph. | Emphasizes that the graph is a strict hierarchical tree without cycles. |
Example for DAG (constructing an arborescence):
Consider a DAG:
A (source, deg⁻=0)
/ \
B C
\ / \
D E
There is one source A. Let's try to construct an arborescence (spanning tree):
- Root:
A. - For
B: Choose edgeA -> B. - For
C: Choose edgeA -> C. - For
D: There are two incoming edges (B->D,C->D). Choose, for example,B->D. - For
E: There is an edgeC->E. Choose it.
We get the arborescence: A -> B, A -> C, B -> D, C -> E. It covers all vertices.
Conclusion
- Arborescence is a fundamental structure, the directed analogue of a tree.
- Arborescence in a DAG is a way to select a "main ancestor" for each vertex in a dependency graph, turning it into a hierarchy.
- The statement that a graph is both usually emphasizes its strict tree-like nature and acyclicity, which simplifies analysis (e.g., allowing root traversals and DP on DAGs simultaneously).
Notations:
V(from Vertices) — the set of vertices of a graph.
For example:V = {A, B, C, D}
E(from Edges) — the set of edges of a graph.
An edge is a pair of vertices.
For an undirected graph:E = {{A,B}, {B,C}}.
For a directed graph (arcs):E = {(A→B), (B→C), (C→A)}— where the order of vertices matters.
1. r ∈ V
r— denotes the root∈— the membership symbol ("belongs to")V— the set of all vertices of the graph- Complete meaning:
ris one of the vertices of the graph
2. deg⁻(r) = 0
deg⁻(v)— the indegree of a vertex (number of edges entering the vertex)- Often denoted as
deg^{-}(v)ord_in(v).
- Often denoted as
(r)— for vertexr= 0— equals zero- Complete meaning: no edge enters the root (there is no arrow pointing to this vertex)
3. v ∈ V \ {r}
v— an arbitrary vertex∈— belongs toV \ {r}— the set of all vertices except the rootr- Complete meaning: for each vertex that is not the root
4. Directed path from r to v
- Directed path — a sequence of vertices
r → ... → v, where each edge is directed from the previous vertex to the next - Exactly one — there exists only one such path; you cannot reach it in different ways
5. Directed cycle
- Directed cycle — a path that starts and ends at the same vertex, following the direction of edges
- Example:
A → B → C → A - Absence of cycles — no such closed routes exist in the graph
In simple terms:
- There is one special root vertex
r - Nothing leads into it (like the source of a river)
- From the root to any other vertex, you can go in only one way
- You cannot go in circles — only from the root downwards
Solution for Problem 207. Course Schedule (Leetcode)
The meaning of the task is extremely simple:
Check if there are cycles in the list of dependencies.
- If there are no cycles: You can create a study plan and complete all courses → true.
- If there is a cycle: (for example, course 0 requires course 1, and course 1 requires course 0), you fall into a "vicious circle" and cannot even start → false.
Visually:
[[1,0]]— this is a straight line (0→1). All good.[[1,0],[0,1]]— this is a ring (0↔1). Deadlock.
class Solution {
public boolean canFinish(int num_courses, int[][] prerequisites) {
// 1. BUILD THE GRAPH STRUCTURE
// Create an adjacency list where each index represents a course,
// and the inner list contains courses that depend on it.
List<List<Integer>> adj = new ArrayList<>();
for (int i = 0; i < num_courses; i++) adj.add(new ArrayList<>());
// 2. FILL THE ADJACENCY LIST AND COUNTER "INDEGREE"
// indegree[i] stores the number of prerequisites required before taking course i.
int[] indegree = new int[num_courses];
for (int[] pair : prerequisites) {
int course = pair[0]; // The dependent course
int pre = pair[1]; // The prerequisite course
adj.get(pre).add(course); // Directed edge: pre -> course
indegree[course]++; // Increment the number of incoming edges for the target
}
// 3. INITIALIZE STARTING POINTS
// The queue 'q' will store courses with NO prerequisites (indegree == 0).
Queue<Integer> q = new LinkedList<>();
for (int i = 0; i < num_courses; i++) {
// If a course has no dependencies, it can be taken immediately.
if (indegree[i] == 0) q.offer(i);
}
// 4. PROCESS THE COURSES (Kahn's Algorithm / BFS)
int visited_count = 0; // Track how many courses we successfully completed
while (!q.isEmpty()) {
int curr = q.poll(); // Take a course from the queue
visited_count++; // Mark it as "taken"
// Look at all courses that depend on the current finished course
for (int neighbor : adj.get(curr)) {
indegree[neighbor]--; // "Remove" the dependency (decrement indegree)
// If all prerequisites are met (indegree is 0),
// the neighbor course becomes available to take.
if (indegree[neighbor] == 0) {
q.offer(neighbor);
}
}
}
// 5. FINAL CYCLE CHECK
// If visited_count equals num_courses, it means we found a valid linear order (no cycles).
// If not, there is at least one cycle (deadlock) in the graph.
return visited_count == num_courses;
}
}