{"id":1721,"date":"2024-02-29T13:00:00","date_gmt":"2024-02-29T13:00:00","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=1721"},"modified":"2024-03-01T12:21:18","modified_gmt":"2024-03-01T12:21:18","slug":"find-median-from-data-stream","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/find-median-from-data-stream\/","title":{"rendered":"Find Median from Data Stream (C++, Java, Python)"},"content":{"rendered":"\n<p>The median is used to describe the central tendency of a dataset. In finding the median from the data stream problem we have to implement a class, MedianFinder, to dynamically compute the median of a data stream, handling even and odd list sizes accurately. In this article, we will discuss an approach to solve this problem efficiently to find median and also discuss the time and space complexity of this problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Find Median from the Data Stream Problem<\/strong><\/h2>\n\n\n\n<p>In this problem to find median from the data stream, we have to write a Median finder class that manages an ordered integer list, The median, the middle value of the list, is calculated differently for lists with even or odd sizes<\/p>\n\n\n\n<p><strong>In this problem, integers are added dynamically to a list of integers, and different functions are used to insert numbers into the list a function to find the median from the list of the current list of integers.<\/strong><\/p>\n\n\n\n<p><strong>There are two cases:<\/strong><\/p>\n\n\n\n<p><strong>Even: <\/strong>if the size of the list is even then there is no middle value, and the median is the mean of the two middle classes.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/VfeBoJevk8NlojXxLj3BhreyjxCSvsLOrD71lW9uBlXUq5FnbRdRhBMmwYO1ILF0hC1MKyHlmdZs4sfmTKXhcEajxdbKgeGtJhFRo7VFKHFfrAvO8cN5GsGTM28dZCC1sT1jC74KS0jhwps3CFIIoFM\" alt=\"For even size\"\/><\/figure>\n<\/div>\n\n\n<p><strong>Odd: <\/strong>if the size of the list is odd then the middle value is the median for the current list of integers.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/nD0mugMHh2a0UkhyhcNxDfDxeUfXxZ-ETcESGGcH3yz-xY85-cC5yiJkZ3DINXGuV3Z_NeUa19AU0eIFZFEXyOJ9hsf5UHmSfQh0HqR5bR3i2mxjJAytNg8PUVjRKwc-i6tMthiQZhvGrzoHrCeRsN8\" alt=\"For odd size\"\/><\/figure>\n<\/div>\n\n\n<p><strong>We have to implement the following things to solve this problem:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MedianFinder(): <\/strong>&nbsp;initializes the MedianFinder object.<\/li>\n\n\n\n<li><strong>addNum(int num): <\/strong>adds the integer num from the data stream to the data structure.<\/li>\n\n\n\n<li><strong>double findMedian(): <\/strong>returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Find Median using Heap<\/strong><\/h2>\n\n\n\n<p>In this method, we divide our numbers into two groups: low numbers and high numbers. We want to quickly find the middle value (the median) without having to sort all the numbers.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Firstly, we divide the numbers into two halves: one holds low numbers, and the other holds high numbers. To quickly access the median in constant time, we use <a href=\"https:\/\/favtutor.com\/blogs\/heap-in-python\">maxHeap and minHeap<\/a> to get the maximum of the low half and the minimum of the high half, respectively.<\/li>\n\n\n\n<li>We will use maxHeap to store half of the low numbers, the top of the maxHeap is the highest number among low numbers, minHeap to store half of the high numbers, top of the minHeap is the lowest number among high numbers.<\/li>\n\n\n\n<li>We have to balance the size between maxHeap and minHeap while processing. Hence after adding k elements,\n<ul class=\"wp-block-list\">\n<li>If k = 2 * i then maxHeap.size = minHeap.size = i<\/li>\n\n\n\n<li>If k = 2 * i + 1, let maxHeap store 1 element more than minHeap, then maxHeap.size = minHeap.size + 1.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>When we add a new number to our MedianFinder:\n<ul class=\"wp-block-list\">\n<li>First, we add the number to the maxHeap. This might result in the maxHeap containing a larger number than should be in the minHeap. To correct this, we remove the highest element from the maxHeap and add it to the minHeap.<\/li>\n\n\n\n<li>Next, if the minHeap has more elements than the maxHeap, we balance the sizes by removing the lowest element from the minHeap and adding it back to the maxHeap.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>When doing findMedian():\n<ul class=\"wp-block-list\">\n<li>If maxHeap.size &gt; minHeap.size returns to the top of the maxHeap, which is the highest number among low numbers.<\/li>\n\n\n\n<li>Else if maxHeap.size == minHeap return the (maxHeap.top() + minHeap.top()) \/ 2.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>C++ Code<\/strong><\/h3>\n\n\n\n<p>Here is the C++ implementation of this solution:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-c++src&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;language&quot;:&quot;C++&quot;,&quot;modeName&quot;:&quot;cpp&quot;}\">#include &lt;iostream&gt;\n#include &lt;queue&gt;\n#include &lt;vector&gt;\nusing namespace std;\n\nclass MedianFinder {\npublic:\n    priority_queue&lt;int&gt; maxHeap; \/\/ Max heap to store smaller half of numbers\n    priority_queue&lt;int, vector&lt;int&gt;, greater&lt;int&gt;&gt; minHeap; \/\/ Min heap to store larger half of numbers\n    \n    MedianFinder() {\n    }\n    \n    void addNum(int num) {\n        maxHeap.push(num); \/\/ Add the number to maxHeap\n        \n        \/\/ Move the top element of maxHeap to minHeap\n        minHeap.push(maxHeap.top());\n        maxHeap.pop();\n        \n        \/\/ If the minHeap has more elements, balance the heaps\n        if (minHeap.size() &gt; maxHeap.size()) {\n            maxHeap.push(minHeap.top());\n            minHeap.pop();\n        }\n    }\n    \n    double findMedian() {\n        if (maxHeap.size() &gt; minHeap.size()) \n            return maxHeap.top(); \/\/ If maxHeap has more elements, median is at the top of maxHeap\n        return (maxHeap.top() + minHeap.top()) \/ 2.0; \/\/ Otherwise, average the tops of both heaps for the median\n    }\n};\n\nint main() {\n    MedianFinder obj;\n    obj.addNum(1);\n    obj.addNum(2);\n    cout &lt;&lt; obj.findMedian() &lt;&lt; endl; \/\/ Output: 1.5\n    obj.addNum(3);\n    cout &lt;&lt; obj.findMedian() &lt;&lt; endl; \/\/ Output: 2.0\n    return 0;\n}<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Python Code<\/strong><\/h3>\n\n\n\n<p>We can do in Python programming also to find median:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;language&quot;:&quot;Python&quot;,&quot;modeName&quot;:&quot;python&quot;}\">import heapq\n\nclass MedianFinder:\n    def __init__(self):\n        self.maxHeap = [] # Max heap to store smaller half of numbers\n        self.minHeap = [] # Min heap to store larger half of numbers\n    \n    def addNum(self, num):\n        heapq.heappush(self.maxHeap, -num) # Add the number to maxHeap\n        \n        # Move the top element of maxHeap to minHeap\n        heapq.heappush(self.minHeap, -heapq.heappop(self.maxHeap))\n        \n        # If the minHeap has more elements, balance the heaps\n        if len(self.minHeap) &gt; len(self.maxHeap):\n            heapq.heappush(self.maxHeap, -heapq.heappop(self.minHeap))\n    \n    def findMedian(self):\n        if len(self.maxHeap) &gt; len(self.minHeap): \n            return -self.maxHeap[0] # If maxHeap has more elements, median is at the top of maxHeap\n        return (-self.maxHeap[0] + self.minHeap[0]) \/ 2.0 # Otherwise, average the tops of both heaps for the median\n\n# Example Usage\nobj = MedianFinder()\nobj.addNum(1)\nobj.addNum(2)\nprint(obj.findMedian()) # Output: 1.5\nobj.addNum(3)\nprint(obj.findMedian()) # Output: 2.0<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Java Code<\/strong><\/h3>\n\n\n\n<p>Well, let&#8217;s also solve it with Java:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;mode&quot;:&quot;clike&quot;,&quot;mime&quot;:&quot;text\/x-java&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:true,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;language&quot;:&quot;Java&quot;,&quot;modeName&quot;:&quot;java&quot;}\">import java.util.PriorityQueue;\n\nclass MedianFinder {\n    PriorityQueue&lt;Integer&gt; maxHeap; \/\/ Max heap to store smaller half of numbers\n    PriorityQueue&lt;Integer&gt; minHeap; \/\/ Min heap to store larger half of numbers\n    \n    MedianFinder() {\n        maxHeap = new PriorityQueue&lt;&gt;((a, b) -&gt; b - a);\n        minHeap = new PriorityQueue&lt;&gt;();\n    }\n    \n    public void addNum(int num) {\n        maxHeap.offer(num); \/\/ Add the number to maxHeap\n        \n        \/\/ Move the top element of maxHeap to minHeap\n        minHeap.offer(maxHeap.poll());\n        \n        \/\/ If the minHeap has more elements, balance the heaps\n        if (minHeap.size() &gt; maxHeap.size()) {\n            maxHeap.offer(minHeap.poll());\n        }\n    }\n    \n    public double findMedian() {\n        if (maxHeap.size() &gt; minHeap.size()) \n            return maxHeap.peek(); \/\/ If maxHeap has more elements, median is at the top of maxHeap\n        return (maxHeap.peek() + minHeap.peek()) \/ 2.0; \/\/ Otherwise, average the tops of both heaps for the median\n    }\n\n    public static void main(String[] args) {\n        MedianFinder obj = new MedianFinder();\n        obj.addNum(1);\n        obj.addNum(2);\n        System.out.println(obj.findMedian()); \/\/ Output: 1.5\n        obj.addNum(3);\n        System.out.println(obj.findMedian()); \/\/ Output: 2.0\n    }\n}<\/pre><\/div>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code has-black-color has-text-color has-background has-link-color wp-elements-00877f6d45d7befa1fab590b577be7b5\" style=\"background-color:#fedcba\"><code>1.5\n2<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Complexity Analysis<\/strong><\/h3>\n\n\n\n<p><strong>The Time complexity is O(log n) for adding a number and O(1) time complexity for finding the median in a data stream, where n is the number of elements added.<\/strong> Overall time complexity is dominated by adding numbers. The space complexity of maintaining these priority queues is <strong>O(n)<\/strong>, where n is the number of elements added to the MedianFinder.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>In conclusion, we have discussed the find median in the data stream problem from <a href=\"https:\/\/leetcode.com\/problems\/find-median-from-data-stream\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">leetcode<\/a> and we solved it using heaps and discussed how functions can be implemented using heaps and also discussed implementation in C++, Java, and Python. With time and space complexity<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understand how to find the median from the data stream using the Heap approach along with implementation in C++, Java, and Python.<\/p>\n","protected":false},"author":12,"featured_media":1723,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[7],"tags":[14,15],"class_list":["post-1721","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-structures-algorithms","tag-dsa","tag-leetcode"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/1721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=1721"}],"version-history":[{"count":3,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/1721\/revisions"}],"predecessor-version":[{"id":2101,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/1721\/revisions\/2101"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/1723"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=1721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=1721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=1721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}