Efficient Distributed Decision Trees for Robust Regression

The availability of massive volumes of data and recent advances in data collection and processing platforms have motivated the development of distributed machine learning algorithms. In numerous real-world applications large datasets are inevitably noisy and contain outliers. These outliers can dramatically degrade the performance of standard machine learning approaches such as regression trees. To this end, we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy data. We propose to integrate robust statistics based error criteria into the regression tree. A data summarization method is developed and used to improve the efficiency of learning regression trees in the distributed setting. We implemented the proposed approach and baselines based on Apache Spark, a popular distributed data processing platform. Extensive experiments on both synthetic and real datasets verify the effectiveness and efficiency of our approach.

Efficient Distributed Decision Trees for Robust Regression

Graph Chatbot

Chat with Graph Search

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Small area vulnerability, household food insecurity and child malnutrition in Medellin, Colombia: results from a repeated cross-sectional study

Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Small area vulnerability, household food insecurity and child malnutrition in Medellin, Colombia: results from a repeated cross-sectional study

Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions