Recent advances in computing, communications, and digital storage technologies, together with the development of high-throughput data-acquisition technologies, have made it possible to gather and store incredible volumes of data. The warehouses of international retailers (such as Wal-Mart) are typically multi-terabyte databases that contain information about retail transactions by customers all over the world. The emergence of these large data sets creates a growing need for analyzing them across geographical lines using distributed and parallel systems like the Grid infrastructure, thereby unlocking the intelligence hidden deep within these geographically distributed databases. Market basket analysis is a method for discovering consumer purchasing patterns by extracting associations or co-occurrences from the stores transaction database. This is a typical association rule mining task where an Apriori algorithm is widely adopted to find out the large item-set. But since the traditional sequential Apriori algorithm can no longer serve the purpose due to the huge amount of data, the strategy for a parallel and distributed association rule mining algorithm is outlined in this paper.
Grid-based data mining for market basket analysis in the retail sector
MALERBA, Donato
2007-01-01
Abstract
Recent advances in computing, communications, and digital storage technologies, together with the development of high-throughput data-acquisition technologies, have made it possible to gather and store incredible volumes of data. The warehouses of international retailers (such as Wal-Mart) are typically multi-terabyte databases that contain information about retail transactions by customers all over the world. The emergence of these large data sets creates a growing need for analyzing them across geographical lines using distributed and parallel systems like the Grid infrastructure, thereby unlocking the intelligence hidden deep within these geographically distributed databases. Market basket analysis is a method for discovering consumer purchasing patterns by extracting associations or co-occurrences from the stores transaction database. This is a typical association rule mining task where an Apriori algorithm is widely adopted to find out the large item-set. But since the traditional sequential Apriori algorithm can no longer serve the purpose due to the huge amount of data, the strategy for a parallel and distributed association rule mining algorithm is outlined in this paper.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.