Most cost-efficient data sorting operation in a cloud competition

Most cost-efficient data sorting operation in a cloud competition
Who
Nanjing University, Alibaba, Databricks
What
1.44 USD/TB (US dollar per Terabyte) US dollar(s)
Where
Not Applicable
When
2016

In 2016, a consortium from the USA and China made up of Nanjing University, Alibaba group and Databricks broke the record for the most-cost efficient data sorting operation on the cloud in the Indy CloudSort competition. The consortium managed to sort 100 Terabytes (TB) using only 144.22 USD worth of cloud resources; this corresponded to a cost of 1.44 USD/TB. It was achieved by using the Apache Spark system operating on the cloud.

As data plays a crucial role in our lives, fast processing of large amounts of data is very important. The “sorting” operation is a key step in data-processing, which can be expensive and time-consuming.

A Terabyte corresponds to 1,000,000,000,000 bytes; this is the equivalent of about 210 DVDs or 1,423 CDs full of data.

The previous record in this category was held by University of California San Diego, with a cost of 4.51 USD/TB.