In a world where complexity is increasing by the day, a group of disciplines collectively known as “data science” is becoming more and more important and useful in addressing compelling problems in an increasing number of scientific branches and business environments. There are already many books on predictive analytics and simulation analytics, but this one shows how they can be used together.
The author presents many innovative concepts, starting from the seemingly bizarre equation “data = information + noise,” which simply means that data certainly contains information, but information is hidden by noise; the methods presented in this book help to extract information from data, rejecting noise. Another important concept is the difference between poor and rich information: poor information is merely descriptive and is found, for example, in many definitions of descriptive statistics; rich information, on the other hand, helps create knowledge from raw data through software applications or other means. This book is all about rich information. More operational statistical concepts, such as predictions, forecasts, and simulations, are also described and applied to several real-life systems. Other high-level concepts presented in the book are linearity, when outputs are proportional to inputs; nonlinearity, when they are not; feedback loops, when system outputs are channeled back to the system as inputs in a repeating cycle; and emergent systems, where the properties of the single parts of the system give rise to different properties for the system as a whole. The book shows how all these concepts are used in predictive and simulation analytics.
The book itself is divided into three sections: predictive analytics, simulation analytics, and their interactions. The predictive analytics section presents basic and advanced time series methods, as well as non-time series methods, and the lifespan of them all, or how long they remain valid before needing to be updated. The simulation analytics section presents stochastic (Monte Carlo) methods and how they are used to design and analyze simulations. The interactions section describes how to use both types of analytics, together, at different scales: operational, or applied to everyday problems; tactical, focused on broader issues and longer terms; and strategic, or encompassing the whole organization at the same time. All sections present theoretical concepts first, in plain, comfortable prose, and then give detailed descriptions of specific algorithms and methods, with corresponding mathematical notations, equations, and code snippets. This, however, makes for a rather dense reading experience.
Throughout the book there are extensive references to websites, in the form of footnotes, as well as to books by other authors, often in the middle of the text. This latter style of reference makes for a fluid and uninterrupted reading on the one hand, but on the other leaves a sense of expectation for further details. In any case, all these references (and many more) are also repeated at the end of the book. Examples, very detailed and very well explained, are often centered on business scenarios. The author recommends its use as a textbook in an academic setting. The audience, however, is certainly broader: for accomplished data scientists and statisticians this book makes for an excellent reference manual, provided they already understand the concepts presented here; and for decision makers in business environments, this book offers an overall description of how business forecasts are made, helping them understand what they pay their data scientists for.