
Machine learning (ML) technologies are becoming mainstream, particularly for the analysis of bulk data. The success of ML systems, however, hinges on the input of high-quality data and the ability to train balanced, fair, and high integrity ML models. Federated learning (FL) was originally described by Google, in 2016, as an approach that allows the use of distributed data to train ML models without the need to copy that potentially private data and concentrate it on a centralized server for processing. While this approach avoids issues with data ownership and privacy, and also improves efficiency, its decentralized nature introduces challenges in verifying the trustworthiness of the data being accessed as well as increasing vulnerabilities in the generated models. Nguyen et al. consider multiple ways for cyber criminals to potentially insert malicious functionality into targeted ML models through poisoning the data or by corrupting the models.
The authors begin by describing the benefits of using FL in training ML models using data spread across different entities with an orchestration server collecting and aggregating model updates. The increased cyberattack surface exposure that results from this approach is examined. Backdoor attacks that poison the data or alter the model directly, along with proposed FL defenses against them, are discussed. Nguyen et al. then categorize the main attack types, examining and summarizing earlier FL backdoor attack survey results and discussing the main steps in FL and potential attack vectors and attack techniques at each step. Backdoor defense methodologies are then described in significant detail. Final remarks look at the challenges and potential future research. The authors then provide a succinct conclusion and thorough references.
The paper is well supported by relevant illustrations and tables. It is an excellent discussion on a topical subject, highlighting that new advances in technology may also come with associated increased risks of cyberattack.