Bucket Sort

Bucket Sort is a sorting algorithm that works by dividing the input list into buckets. Each bucket is then individually sorted using another sorting algorithm , and finally the buckets are merged back together. The operations are as follows:

Create buckets each of which is a list.
For each element in the input list, hash it to determine which bucket it belongs to.
Sort each of the buckets using .
Merge all the sorted buckets back together.

The most important property of Bucket Sort is that it assumes a uniform distribution of the input list. If the input list is uniformly distributed, then each bucket will be of equal size. If the elements are not uniformly distributed, then the size of the buckets will vary, and the performance of the sort will degrade.

Picking is a trade-off between the computation time and memory space required to sort the input list. The more buckets we allocate, the more memory space we require, but we might be able to sort the input using fewer comparisons. In fact, in the best scenario, we can sort the input list without any comparisons at all ().

Time complexity analysis

Depending on the amount of buckets we allocate for the input list, the behavior of the algorithm will be different. We can picture the following scenarios:

There are the same number of buckets as elements in the input list ()
There are fewer buckets than elements in the input list ()

If the number of buckets is equal to the number of elements, then zero comparisons are required. This is because the hashing function will position each element into the correct bucket based on its value. This is the best scenario, and in this case, the expected time complexity is expressed as .

If we have fewer buckets than we do elements, then each bucket will have some element count greater than one. This means that the sub-sorting procedure will require at least one comparison against the other elements within the same bucket. This time, the time complexity depends on the sorting algorithm used to sort the buckets. If we express the time complexity of as , then the total time complexity of Bucket Sort is .

This aligns with our first scenario. The time required to sort a bucket of size is , because the bucket is already sorted.

A third scenario is when the number of buckets is greater than the number of elements, but this is rarely ever the case in practice. In the case where there are more buckets than elements, we’re simply sorting empty lists during the merge phase. This has zero implication on the time complexity of the algorithm. As we will see, this will, however, have a negative impact on the space complexity.

Space complexity analysis

As we saw in the time complexity analysis, the space complexity of Bucket Sort depends on the number of buckets we allocate. Good implementations will use a data structure that doesn’t perform unnecessary allocations, such as a linked list. This is because we don’t know how many elements will be in each bucket in advance. Using a linked list means we only ever allocate nodes across the buckets. This means that the space complexity of Bucket Sort is .

If we were to pre-allocate the buckets, then we would have to allocate elements in each bucket unless we choose a data structure that is capable of resizing. In this scenario, the space complexity of Bucket Sort is .

Implementation

The following is a Java implementation of the Bucket Sort algorithm. In order to simplify the implementation, we use the standard Collections.sort method to sort the individual buckets. Depending on the Java JDK version, and the data type provided, the implementation is typically either TimSort, MergeSort, or a Dual-Pivot Quicksort.

Java Implementation

1
package jun.codes.interviews.sort;
2

3
import java.util.Collections;
4
import java.util.LinkedList;
5

6
public class BucketSort {
7
  public static <T extends Comparable<T>> void sort(T[] xs, int k) {
8
    // While BucketSort assumes a uniform distribution, we cannot guarantee that
9
    // the collection is uniform. Therefore, the worst case scenario is that
10
    // all elements of `xs` go into the same bucket.
11
    LinkedList<T>[] buckets = (LinkedList<T>[]) new LinkedList[k];
12
    for (int i = 0; i < k; i++) {
13
      buckets[i] = new LinkedList<>();
14
    }
15

16
    // To get the ideally uniform position in the array, we need to find the
17
    // minimum and maximum values
18
    T min = xs[0];
19
    T max = xs[0];
20
    for (T x : xs) {
21
      if (x.compareTo(min) < 0) {
22
        min = x;
23
      }
24
      if (x.compareTo(max) > 0) {
25
        max = x;
26
      }
27
    }
28

29
    // Insert each element into its corresponding bucket
30
    for (T x : xs) {
31
      double range = max.compareTo(min);
32
      double norm = x.compareTo(min) / range;
33
      int bucket = (int) (norm * (k - 1));
34
      buckets[bucket].add(x);
35
    }
36

37
    // Merge all the buckets into the original array
38
    int i = 0;
39
    for (LinkedList<T> bucket : buckets) {
40
      Collections.sort(bucket);
41
      for (T x : bucket) {
42
        xs[i] = x;
43
        i++;
44
      }
45
    }
46
  }
47
}

Bucket Sort

Time complexity analysis

Space complexity analysis

Implementation

Further reading