TUTORIAL PLATFORM: Bucket Sort

Bucket sort can be exceptionally fast because of the way elements are assigned to buckets, typically using an array where the index is the value. This means that more auxiliary memory is required for the buckets at the cost of running time than more comparison sorts. It runs in

O(n+k)

time in the average case where

n

is the number of elements to be sorted and

k

is the number of buckets.

While bucket sort is a distribution sort, it typically uses a comparison sort to sort the buckets after they have been allocated.

Complexity

Time			Space
Worst case	Best case	Average case	Worst case
$O(n^2)$	$O(n + k)$	$O(n + k)$	$O(n + k)$ auxiliary

When it’s fast

Bucket sort’s best case occurs when the data being sorted can be distributed between the buckets perfectly. If the values are sparsely allocated over the possible value range, a larger bucket size is better since the buckets will likely be more evenly distributed. An example of this is [2303, 33, 1044], if buckets can only contain 5 different values then for this example 461 buckets would need to be initialised. A bucket size of 200-1000 would be much more reasonable.

The inverse of this is also true; a densely allocated array like [103, 99, 119, 112, 111] performs best when buckets are as small as possible.

Bucket sort is an ideal algorithm choice when:

The additional $O(n + k)$ memory usage is not an issue
Elements are expected to be fairly evenly distributed

When it’s slow

Bucket sort performs at its worst,

O(n^2)

, when all elements at allocated to the same bucket. Since individual buckets are sorted using another algorithm, if only a single bucket needs to be sorted, bucket sort will take on the complexity of the inner sorting algorithm.

This depends on the individual implementation though and can be mitigated. For example a bucket sort algorithm could be made to work with large bucket sizes by using insertion sort on small buckets (due to its low overhead), and merge sort or quicksort on larger buckets.

Bucket sort vs counting sort vs radix sort

There are two main variants of bucket sort; one where there is a bucket for each value, and where buckets hold several values. Bucket sort is often seen as a generalisation of counting sort because bucket sort with a bucket size of 1 is essentially counting sort, just with a more complex implementation. This can be spun the other way as well by saying that bucket sort is counting sort that uses more sophisticated buckets.

Another similar sort, radix sort, also distributes items into buckets. However, they are distributed based on the individual digits within the values themselves.

The idea is to use bucket sort. Following is bucket algorithm.

bucketSort(arr[], n)
1) Create n empty buckets (Or lists).
2) Do following for every array element arr[i].
.......a) Insert arr[i] into bucket[n*array[i]]
3) Sort individual buckets using insertion sort.
4) Concatenate all sorted buckets.

In summary:

Counting sort: buckets hold only a single value
Bucket sort: buckets hold a range of values
Radix sort: buckets hold values based on digits within their values

BUCKET SORT CODE

import java.util.*;
 
public class BucketSort{
 
   public static void sort(int[] a, int maxVal) {
      int [] bucket=new int[maxVal+1];
 
      for (int i=0; i<bucket.length; i++) {
         bucket[i]=0;
      }
 
      for (int i=0; i<a.length; i++) {
         bucket[a[i]]++;
      }
 
      int outPos=0;
      for (int i=0; i<bucket.length; i++) {
         for (int j=0; j<bucket[i]; j++) {
            a[outPos++]=i;
         }
      }
   }
 
 
   public static void main(String[] args) {
      int maxVal=5;
      int [] data= {5,3,0,2,4,1,0,5,2,3,1,4}; 
 
      System.out.println("Before: " + Arrays.toString(data));
      sort(data,maxVal);
      System.out.println("After:  " + Arrays.toString(data));
   }
}

Output:

# java BucketSort 
Before: [5, 3, 0, 2, 4, 1, 0, 5, 2, 3, 1, 4]
After:  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5]

ANOTHER CODE FOR BUCKET SORT WITH OUTPUT

//This is a java program to sort numbers using bucket sort
import java.util.Random;
 
public class Bucket_Sort 
{
    static int[] sort(int[] sequence, int maxValue) 
    {
        // Bucket Sort
        int[] Bucket = new int[maxValue + 1];
        int[] sorted_sequence = new int[sequence.length];
 
        for (int i = 0; i < sequence.length; i++)
            Bucket[sequence[i]]++;
 
        int outPos = 0;
        for (int i = 0; i < Bucket.length; i++)
            for (int j = 0; j < Bucket[i]; j++)
                sorted_sequence[outPos++] = i;
 
        return sorted_sequence;
    }
 
    static void printSequence(int[] sorted_sequence) 
    {
        for (int i = 0; i < sorted_sequence.length; i++)
            System.out.print(sorted_sequence[i] + " ");
    }
 
    static int maxValue(int[] sequence) 
    {
        int maxValue = 0;
        for (int i = 0; i < sequence.length; i++)
            if (sequence[i] > maxValue)
                maxValue = sequence[i];
        return maxValue;
    }
 
    public static void main(String args[]) 
    {
        System.out
                .println("Sorting of randomly generated numbers using BUCKET SORT");
        Random random = new Random();
        int N = 20;
        int[] sequence = new int[N];
 
        for (int i = 0; i < N; i++)
            sequence[i] = Math.abs(random.nextInt(100));
 
        int maxValue = maxValue(sequence);
 
        System.out.println("\nOriginal Sequence: ");
        printSequence(sequence);
 
        System.out.println("\nSorted Sequence: ");
        printSequence(sort(sequence, maxValue));
    }
}

Output:

$ javac Bucket_Sort.java
$ java Bucket_Sort
 
Sorting of randomly generated numbers using BUCKET SORT
 
Original Sequence: 
95 9 95 87 8 81 18 54 57 53 92 15 38 24 8 56 29 69 64 66 
Sorted Sequence: 
8 8 9 15 18 24 29 38 53 54 56 57 64 66 69 81 87 92 95 95

ANOTHER CODE FOR BUCKET SORT

public class BucketSortJava {

 public static void main(String args[]) {

  int i, a[] = {3, 6, 9, 0, 5, 1, 8, 4, 3, 1}, n = 10;

     System.out.println("Before sorting:\n");

     for(i = 0; i < n; ++i)

         System.out.print(a[i] + " ");

     bucketSort(a, n);

     System.out.println("\n\nAfter sorting:\n");

     for(i = 0; i < n; ++i)

         System.out.print(a[i] + " "); 

 }

 static void bucketSort(int a[], int n) {

     int i, j, k, SIZE = 10;

     int buckets[] = new int[SIZE];

     for(i = 0; i < SIZE; ++i)

         buckets[i] = 0;

     for(i = 0; i < n; ++i)

         ++buckets[a[i]];

     for(i = 0, j = 0; j < SIZE; ++j)

         for(k = buckets[j]; k > 0; --k)

             a[i++] = j;

 }

}

Output

Before sorting:

3 6 9 0 5 1 8 4 3 1

After sorting:

0 1 1 3 3 4 5 6 8 9

CODE FOR IMPLEMENTATION OF BUCKET CODE

 /

 * Title: Bucketsort

 * Project: sjavaspot

 * Package: algorithms

 *

 * Statement:

 *   Given a disordered list of integers, rearrange them in natural order.

 *

 * Sample Input: {8,5,3,1,9,6,0,7,4,2,5}

 * Sample Output: {0,1,2,3,4,5,6,7,8,9,5}

 *

 * Time Complexity of Solution:

 *     Best Case O(n); Average Case O(n); Worst Case O(n).

 *

 * Approach:

 *   If it sounds too good to be true, then most likely it's not true.

 *   Bucketsort is not an exception to this adage. For bucketsort to work at

 *   its blazing efficiency, there are multiple prerequisites. First the

 *   hash function that is used to partition the elements need to be very

 *   good and must produce ordered hash: if i < k then hash(i) < hash(k).

 *   Second, the elements to be sorted must be uniformly distributed.

 *

 *   The aforementioned aside, bucket sort is actually very good considering

 *   that counting sort is reasonably speaking its upper bound. And counting

 *   sort is very fast. The particular distinction for bucket sort is that

 *   it uses a hash function to partition the keys of the input array, so

 *   that multiple keys may hash to the same bucket. Hence each bucket must

 *   effectively be a growable list; similar to radix sort.

 *

 *   Numerous Internet sites, including university pages, have erroneously

 *   written counting sort code and call them bucket sort. Bucket sort uses

 *   a hash function to distribute keys; counting sort creates a bucket for

 *   each key. Indeed there are perhaps greater similarities between radix

 *   sort and bucket sort, than there are between counting sort and bucket sort.

 *

 *   In the presented program Java's Collections.sort(C) is used to sort each

 *   bucket. This is to inculcate that the bucket sort algorithm does not

 *   specify which sorting technique to use on the buckets. A programmer may

 *   choose to continuously use bucket sort on each bucket until the

 *   collection is sorted (in the manner of the radix sort program below).

 *   Whichever sorting method is used on the buckets, bucket sort still

 *   tends toward O(n).

 *

 ****************************************************************************/

 public class BucketSortCode{

    public void bucketsort(int[] input) {

    //get hash codes

       final int[] code = hash(input);

    //create and initialize buckets to ArrayList: O(n)

       List<Integer>[] buckets = new List[code[1]];

       for (int i = 0; i < code[1]; i++) {

           buckets[i] = new ArrayList<Integer>();

       }

    //distribute data into buckets: O(n)

       for (int i : input) {

           buckets[hash(i, code)].add(i);

       }

  /**

   * Sort each bucket: O(n).

   * I mentioned above that the worst case for bucket sort is counting

   * sort. That's because in the worst case, bucket sort may end up

   * with one bucket per key. In such case, sorting each bucket would

   * take 1^2 = O(1). Even after allowing for some probabilistic

   * variance, to sort each bucket would still take 2-1/n, which is

   * still a constant. Hence, sorting all the buckets takes O(n).

   ***/

      for (List bucket : buckets) {

          Collections.sort(bucket);

      }

      int ndx = 0;

    //merge the buckets: O(n)

      for (int b = 0; b < buckets.length; b++) {

          for (int v : buckets[b]) {

              input[ndx++] = v;

          }

      }

  }

  private int[] hash(int[] input) {

     int m = input[0];

     for (int i = 1; i < input.length; i++) {

         if (m < input[i]) {

            m = input[i];

         }

     }

     return new int[]{m, (int) Math.sqrt(input.length)};

  }

  private int hash(int i, int[] code) {

    return (int) ((double) i / code[0] * (code[1] - 1));

  }

}

Bitcoin

Bucket Sort

Complexity

When it’s fast

When it’s slow

Bucket sort vs counting sort vs radix sort

Output:

No comments:

Post a Comment

Facebook

Popular Posts