Machine learning safety delves into the growing field of verification, robustness, training against adversarial attacks, privacy enhancement, and other safety concerns in the context of deep learning (DL) and deep reinforcement learning (DRL). The authors have previously published pioneering research in this domain.
Starting from chapter 10, the material becomes particularly useful for advanced researchers specializing in the safety aspects of DL and DRL, provided they possess a background in Minkowski sum, Kripke structure, Lipschitz continuity, and related concepts. Practitioners in this rapidly expanding field can make use of different examples, including reachability analysis using DeepGo (open-source software developed by the authors) from chapter 11. However, it is worth noting that the book does not include any safety issues specifically related to large language models (LLMs), despite the fact that LLM uses DL and DRL with human feedback in addition to a transformer-based attention mechanism.
For those new to the field of machine learning (ML) safety, the initial nine chapters offer valuable insights.
The life cycle of a DL- or DRL-based system, such as the LLM-based ChatGPT or a robot on an assembly line, involves software development, training, verification, testing, and deployment (inference). The book covers pre-deployment research topics to ensure these systems perform as specified, even in the face of adversarial attacks.
Discussions within the book encompass generalization error, uncertainty, robustness, adversarial attacks, poisoning, backdoor attacks, model stealing, membership inference, and model inversion. Possible threat implementations against classical ML systems like decision trees, k-nearest neighbors, linear regression, and naive Bayes are included up to chapter 9.
While these may seem like academic exercises for some, they set the stage for the book’s standout section starting from 10.6, which delves into uncertainty estimation, introducing safety issues such as robustness, adversarial attacks, poisoning attacks, model stealing, and others.
Examples with more code samples could make this chapter more interesting. Transition among sections could be more intuitive and less abrupt. For example, estimating posterior distribution using Monte Carlo sampling or variational inference or Laplace approximation can be interlinked by a comparison. Additionally, the book does not say what to do after the uncertainties are measured in a practical situation.
Part 3 of the book provides safety solutions related to robustness, reachability (chapter 11), adversarial training, and differential privacy (chapter 12) for DL.
Chapter 13 describes and evaluates DRL models. Sections 13.5 to 13.9 describe an insightful progression of discrete-time Markov chain (DTMC)-based safety properties related to policy robustness verification. On page 228, the second-to-last paragraph refers to a “paper,” which seems to be a typographical error or the section is from one of the papers the authors wrote. The symbol “⋄” (diamond) employed in this section requires clarification as it may carry multiple interpretations.
While chapter 14 addresses testing techniques, it lacks actionable information for practitioners.
The provided code examples do not cover all the threats. Part 5 includes rudimentary math concepts but misses the opportunity to elaborate on topics like Kullback–Leibler (KL) divergence, Minkowski sum, Kripke structure, Lipschitz continuity and approximation, mixed-integer linear programming (MILP), DTMC, and so on in the book’s context. A reverse index at the back, and a list of notations with their meaning, would make the book more useful and easy to read.
The authors are top researchers in the field. Their insights are not always presented in their published papers for the sake of brevity. Discussions for the sake of preciseness depend more on the language of mathematics and less on simple expressions of insight that a reader may expect from a book. Books like Deep learning [1] and Reinforcement learning [2] manage to provide insight about a concept before making them precise using an equation. Machine learning safety addresses complicated concepts that are sometimes not even fully developed, often expecting a much better prepared reader. Overall, this book leaves a reader longing for more profound insights.