Most cost-efficient data sorting operation in a cloud competition

- Who
- Nanjing University, Alibaba, Databricks
- What
- 1.44 USD/TB (US dollar per Terabyte) US dollar(s)
- Where
- Not Applicable
- When
- 2016
In 2016, a consortium from the USA and China made up of Nanjing University, Alibaba group and Databricks broke the record for the most-cost efficient data sorting operation on the cloud in the Indy CloudSort competition. The consortium managed to sort 100 Terabytes (TB) using only 144.22 USD worth of cloud resources; this corresponded to a cost of 1.44 USD/TB. It was achieved by using the Apache Spark system operating on the cloud.
As data plays a crucial role in our lives, fast processing of large amounts of data is very important. The “sorting” operation is a key step in data-processing, which can be expensive and time-consuming.
A Terabyte corresponds to 1,000,000,000,000 bytes; this is the equivalent of about 210 DVDs or 1,423 CDs full of data.
The previous record in this category was held by University of California San Diego, with a cost of 4.51 USD/TB.