Bucket Sort Algorithm Presentation at Advanced Algorithms and Data Structures at Tallinn University of Technology

This day was for another presentation at university. This time the course is Advanced Algorithms and Data Structures. What is done at the course is explore and discuss different algorithms. I particularly took the Bucket sort algorithm for the presentation.

Consequently, the subject was Sequential and Parallel Bucket Sort. The purpose was to inform of the algorithm. First of all, I generalized a picture for a sorting, talked about places to store data, a method of distribution and sorting algorithms. Then I turned to actual Bucket sort. The final talk was about Sample sort as an alternative to the prior.

The sorting is the most common operation performed by a computer. When some data is been had, the sorting will be needed for learning its values and properties. For an example, to learn its maximum, minimum and sequential flow. The meaning of the sorting is the arranging unordered data into increasing (or decreasing) order. Sorted data is easier to manipulate. It is human-readable. Moreover, it is required by algorithms such as Search and Merge.

Both the input sequences and the sorted sequences are for the data to be stored. For storing reason, there are a couple of variants. The place for sequential sorting is locally in the process’s memory. As for parallel sorting, the places are on one of the processes or distributed among the processes.

If it is assumed that the data for the sorting is distributed among the processes, then a method of distribution will be applied. Firstly, the processes are enumerated. Secondly, an enumeration is used to specify a global ordering and the data is sorted according to this enumeration. Finally, the data is sequentially gathered and concatenated on the output. Particularly, a distribution sort is referred to distributing the data among the processes.

Sorting algorithms can be categorized as comparison-based or non-comparison-based. The comparison-based sorting is a sorting by repeatedly applying comparison operation (less than, less than or equal). It includes Quick sort, Insertion sort, Bubble sort. The non-comparison-based sorting is a sorting by using its properties (such as distribution). It includes Bucket sort.

Bucket sort

Bucket sort is a divide and conquer algorithm for sorting. Divide and conquer is an algorithm design paradigm.

Sequential Bucket sort

It is supposed that n elements, m buckets, S interval [a, b] are given. The algorithm takes the flow of steps as follows:

  1. S is divided into m equal-sized sub-intervals (buckets). Elements are placed in the appropriate bucket. 1 bucket has roughly n/m elements
  2. Buckets are sorted by an optimal sequential sorting algorithm, yielding a sorted sequence

Parallel Bucket sort

It is supposed that n elements, m buckets, p processes , S interval [a, b] are given. The algorithm takes the flow of steps as follows:

  1. S is divided into m equal-sized sub-intervals (buckets). Elements are placed in the appropriate bucket. 1 bucket has roughly n/m elements. m=p (1 bucket for 1 process)
  2. Buckets are sorted internally by an optimal sequential sorting algorithm
  3. Buckets are redistributed, one gets smaller elements and the other gets larger elements
  4. Elements are sequentially gathered and concatenated, yielding a sorted sequence

Sample sort

An analysis of Bucket sort demonstrates disadvantages. It is assumed that input elements are uniformly distributed over an interval [a, b] and its distribution is known. Consequently, if buckets have a significantly different number of elements, performance will be degraded. Alternatively, Sample sort can be used.

Sequential Sample sort

It is supposed that n elements, m buckets are given. The algorithm takes the flow of steps as follows:

  1. n is divided into m equal-sized buckets. 1 bucket has roughly n/m elements
  2. Buckets are sorted by an optimal sequential sorting algorithm
  3. m−1 evenly spaced elements are chosen from each bucket. m(m−1) elements selected from all buckets represent the sample
  4. The sample is sorted by an optimal sequential sorting algorithm and m−1 evenly spaced elements (splitters) are chosen
  5. Splitters are used to determine the range of buckets. All buckets are cooperated to sort the sequence according to splitters. One gets smaller elements and the other gets larger elements

Parallel Sample sort

It is supposed that n elements, m buckets, p processes are given. The algorithm takes the flow of steps as follows:

  1. n is divided into m equal-sized buckets. 1 bucket has roughly n/m elements. m=p (1 bucket for 1 process)
  2. Buckets are sorted internally by an optimal sequential sorting algorithm
  3. m−1 evenly spaced elements are chosen from each bucket. m(m−1) elements selected from all buckets represent the sample
  4. The sample is received by only 1 process. It is sorted by an optimal sequential sorting algorithm and m−1 evenly spaced elements (splitters) are chosen
  5. This process broadcasts m-1 splitters to all other processes
  6. Splitters are used to determine the range of buckets. All processes are cooperated to sort the sequence according to splitters. One gets smaller elements and the other gets larger elements

Feedback

As for questions about the subject and the things explained, I was asked what is actually better either Bucket sort or Sample sort. Bucket sort will be better, when elements are uniformly distributed over an interval [a, b] and its distribution is known. Sample sort is better otherwise.

Although it happened to be no auxiliary illustration, this overview possibly could make certain positive impression.

References

Tags:

Comments are closed.