The median is used to describe the central tendency of a dataset. In finding the median from the data stream problem we have to implement a class, MedianFinder, to dynamically compute the median of a data stream, handling even and odd list sizes accurately. In this article, we will discuss an approach to solve this problem efficiently to find median and also discuss the time and space complexity of this problem.
Find Median from the Data Stream Problem
In this problem to find median from the data stream, we have to write a Median finder class that manages an ordered integer list, The median, the middle value of the list, is calculated differently for lists with even or odd sizes
In this problem, integers are added dynamically to a list of integers, and different functions are used to insert numbers into the list a function to find the median from the list of the current list of integers.
There are two cases:
Even: if the size of the list is even then there is no middle value, and the median is the mean of the two middle classes.
Odd: if the size of the list is odd then the middle value is the median for the current list of integers.
We have to implement the following things to solve this problem:
- MedianFinder(): initializes the MedianFinder object.
- addNum(int num): adds the integer num from the data stream to the data structure.
- double findMedian(): returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.
Find Median using Heap
In this method, we divide our numbers into two groups: low numbers and high numbers. We want to quickly find the middle value (the median) without having to sort all the numbers.
- Firstly, we divide the numbers into two halves: one holds low numbers, and the other holds high numbers. To quickly access the median in constant time, we use maxHeap and minHeap to get the maximum of the low half and the minimum of the high half, respectively.
- We will use maxHeap to store half of the low numbers, the top of the maxHeap is the highest number among low numbers, minHeap to store half of the high numbers, top of the minHeap is the lowest number among high numbers.
- We have to balance the size between maxHeap and minHeap while processing. Hence after adding k elements,
- If k = 2 * i then maxHeap.size = minHeap.size = i
- If k = 2 * i + 1, let maxHeap store 1 element more than minHeap, then maxHeap.size = minHeap.size + 1.
- When we add a new number to our MedianFinder:
- First, we add the number to the maxHeap. This might result in the maxHeap containing a larger number than should be in the minHeap. To correct this, we remove the highest element from the maxHeap and add it to the minHeap.
- Next, if the minHeap has more elements than the maxHeap, we balance the sizes by removing the lowest element from the minHeap and adding it back to the maxHeap.
- When doing findMedian():
- If maxHeap.size > minHeap.size returns to the top of the maxHeap, which is the highest number among low numbers.
- Else if maxHeap.size == minHeap return the (maxHeap.top() + minHeap.top()) / 2.
C++ Code
Here is the C++ implementation of this solution:
#include <iostream> #include <queue> #include <vector> using namespace std; class MedianFinder { public: priority_queue<int> maxHeap; // Max heap to store smaller half of numbers priority_queue<int, vector<int>, greater<int>> minHeap; // Min heap to store larger half of numbers MedianFinder() { } void addNum(int num) { maxHeap.push(num); // Add the number to maxHeap // Move the top element of maxHeap to minHeap minHeap.push(maxHeap.top()); maxHeap.pop(); // If the minHeap has more elements, balance the heaps if (minHeap.size() > maxHeap.size()) { maxHeap.push(minHeap.top()); minHeap.pop(); } } double findMedian() { if (maxHeap.size() > minHeap.size()) return maxHeap.top(); // If maxHeap has more elements, median is at the top of maxHeap return (maxHeap.top() + minHeap.top()) / 2.0; // Otherwise, average the tops of both heaps for the median } }; int main() { MedianFinder obj; obj.addNum(1); obj.addNum(2); cout << obj.findMedian() << endl; // Output: 1.5 obj.addNum(3); cout << obj.findMedian() << endl; // Output: 2.0 return 0; }
Python Code
We can do in Python programming also to find median:
import heapq class MedianFinder: def __init__(self): self.maxHeap = [] # Max heap to store smaller half of numbers self.minHeap = [] # Min heap to store larger half of numbers def addNum(self, num): heapq.heappush(self.maxHeap, -num) # Add the number to maxHeap # Move the top element of maxHeap to minHeap heapq.heappush(self.minHeap, -heapq.heappop(self.maxHeap)) # If the minHeap has more elements, balance the heaps if len(self.minHeap) > len(self.maxHeap): heapq.heappush(self.maxHeap, -heapq.heappop(self.minHeap)) def findMedian(self): if len(self.maxHeap) > len(self.minHeap): return -self.maxHeap[0] # If maxHeap has more elements, median is at the top of maxHeap return (-self.maxHeap[0] + self.minHeap[0]) / 2.0 # Otherwise, average the tops of both heaps for the median # Example Usage obj = MedianFinder() obj.addNum(1) obj.addNum(2) print(obj.findMedian()) # Output: 1.5 obj.addNum(3) print(obj.findMedian()) # Output: 2.0
Java Code
Well, let’s also solve it with Java:
import java.util.PriorityQueue; class MedianFinder { PriorityQueue<Integer> maxHeap; // Max heap to store smaller half of numbers PriorityQueue<Integer> minHeap; // Min heap to store larger half of numbers MedianFinder() { maxHeap = new PriorityQueue<>((a, b) -> b - a); minHeap = new PriorityQueue<>(); } public void addNum(int num) { maxHeap.offer(num); // Add the number to maxHeap // Move the top element of maxHeap to minHeap minHeap.offer(maxHeap.poll()); // If the minHeap has more elements, balance the heaps if (minHeap.size() > maxHeap.size()) { maxHeap.offer(minHeap.poll()); } } public double findMedian() { if (maxHeap.size() > minHeap.size()) return maxHeap.peek(); // If maxHeap has more elements, median is at the top of maxHeap return (maxHeap.peek() + minHeap.peek()) / 2.0; // Otherwise, average the tops of both heaps for the median } public static void main(String[] args) { MedianFinder obj = new MedianFinder(); obj.addNum(1); obj.addNum(2); System.out.println(obj.findMedian()); // Output: 1.5 obj.addNum(3); System.out.println(obj.findMedian()); // Output: 2.0 } }
Output:
1.5
2
Complexity Analysis
The Time complexity is O(log n) for adding a number and O(1) time complexity for finding the median in a data stream, where n is the number of elements added. Overall time complexity is dominated by adding numbers. The space complexity of maintaining these priority queues is O(n), where n is the number of elements added to the MedianFinder.
Conclusion
In conclusion, we have discussed the find median in the data stream problem from leetcode and we solved it using heaps and discussed how functions can be implemented using heaps and also discussed implementation in C++, Java, and Python. With time and space complexity