Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Googles computing infrastructure is based on commodity hardware.In this paper, Herbach describes PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation.Herbach shows how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models.Herbach also discusses the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising.moreless
Please read the following before uploading
Do not upload anything which you do not own or are fully licensed to upload. The images should not contain any sexually explicit content, race hatred material or other offensive symbols or images. Remember: Abuse of the TV.com image system may result in you being banned from uploading images or from the entire site – so, play nice and respect the rules!