⚖️ Heap: Find Median from Data Stream
📝 Description
LeetCode 295 The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values. Design a data structure that supports adding integers and finding the median.
Real-World Application
Used in Real-time Analytics to calculate p50 latency or other percentiles on streaming data without storing/sorting the entire dataset history.
🛠️ Constraints & Edge Cases
- \(1 \le \text{calls} \le 5 \cdot 10^4\)
- \(-10^5 \le num \le 10^5\)
- Edge Cases to Watch:
- No elements (usually constraints say at least 1 for findMedian).
- Single element.
🧠 Approach & Intuition
The Aha! Moment
We need the middle element. Sorting every time is \(O(N \log N)\). Maintaining a sorted list is \(O(N)\) insertion. We can maintain the middle property by splitting the data into two halves: a Small Half (Max-Heap) and a Large Half (Min-Heap). The top of these heaps will be the candidates for the median.
🐢 Brute Force (Naive)
Append to list, sort on findMedian.
- Time Complexity: \(O(N \log N)\) per call.
🐇 Optimal Approach
- Heaps:
small: Max-Heap (stores lower half of numbers).large: Min-Heap (stores upper half of numbers).
- AddNum(num):
- Push to
small(Max-Heap). - Pop max from
smalland push tolarge(Min-Heap). - Balance: If
len(small) < len(large), move top oflargeback tosmall. - Invariant:
len(small) >= len(large).
- Push to
- FindMedian:
- If
len(small) > len(large): Returnsmall.top. - Else: Return
(small.top + large.top) / 2.0.
- If
🧩 Visual Tracing
graph TD
A["Input: 1, 2, 3"]
A --> B["Add 1 → Small: {1}, Large: {}"]
B --> C["Add 2 → Small: {1}, Large: {2}"]
C --> D["Add 3 → Rebalance → Small: {1,2}, Large: {3}"]
D --> E["Median = 2"]
💻 Solution Implementation
⏱️ Complexity Analysis
- Time Complexity: \(O(\log N)\) for
addNum, \(O(1)\) forfindMedian. - Space Complexity: \(\mathcal{O}(N)\) — Store all numbers.
🎤 Interview Toolkit
- Follow-up: If all integers are in range [0, 100], how to optimize? (Use a frequency array/Bucket Sort logic).
- Follow-up: If 99% of calls are
addNum, is this efficient? (Yes, log N is good).