The recent and ongoing digital world expansion now allows anyone to have access to a tremendous amount of information. However collecting data is not an end in itself and thus techniques must be designed to gain in-depth knowledge from these large data bases.
This has led to a growing interest for statistics, as a tool to find patterns in complex data structures, and particularly for turnkey algorithms which do not require specific skills from the user.
Such algorithms are quite often designed based on a hunch without any theoretical guarantee. Indeed, the overlay of several simple steps (as in random forests or neural networks) makes the analysis more arduous. Nonetheless, the theory is vital to give assurance on how algorithms operate thus preventing their outputs to be misunderstood.
Among the most basic statistical properties is the consistency which states that predictions are asymptotically accurate when the number of observations increases. In this talk, I will present a first result on Breiman’s forests consistency and show how it sheds some lights on its good performance in a sparse regression setting. I will also present new results on minimax rates of Mondrian forests which highlight the benefits of forests compared to individual regression trees.