Logo and page links

Main menu

Finterai, exit report: Machine learning without data sharing

About the project

A key question regarding the use of federated learning – including in this sandbox project – is whether the machine learning models that are exchanged between participants contain personal data from local data. The answer is important from a regulatory point of view, since the data protection regulations only apply to the processing of personal data.

When the federated learning method is applied to personal data, it is in principle only the learning, or “model parameters”, that are supposed to be shared between participants. Nevertheless, there is a hypothetical possibility that personal data may be deduced if the model has vulnerabilities. Although personal data are not sent or stored externally, the weights (model parameters) are shared. The model parameters are the weights that represent the model’s learning. If the model has learned personal data, the weights could hypothetically reveal this information to any ill-intentioned participants who actively attack the model.

However, if the local data does not leave the banks’ local datasets, what is then shared? The answer is the model parameters and hyperparameters.

Model parameters and hyperparameters

Hyperparameters establish the framework for how machine learning is to take place. In other words, they define what the learning is to be based on and determine how datapoints are co-related. On the other hand, the model parameters contain the specific weights (contents) that the model is supposed to learn from.

A backpropagation algorithm is used to train the model parameters. This algorithm identifies how weights should be changed to make the machine learning solution’s predictions more precise. The learning solution's predictions are what finally result in the identification of the money laundering risk. The processes that make this possible consist of several steps, but the use of federated learning is crucial.

The extent to which a machine learning process based on federated learning permits the re-identification of data used to teach it depends on the design of the specific model and the training process. Challenges must therefore be assessed on the basis of the specific choice of solutions architecture and machine learning model. This is addressed in the chapter on security challenges.

How does Finterai intend to use federated learning?

To realise its ambition, Finterai will use federated learning in a slightly different way to Google’s version of the technology. The biggest difference comes right at the beginning, with a step before the model is distributed to the participants. In this step, the participants themselves decide what kind of model is to be trained, with each participant defining the model’s hyperparameters. In other words, it is Finterai’s customers and not Finterai itself who define which machine learning models are to be trained federally.

This leads to a new difference in the system. Finterai’s federated learning process is serial rather than parallel, in that a machine learning model is trained first by one participant before being sent to the next one. This contrasts with Google’s approach, in which machine learning models are sent out simultaneously to all participants, who then update the central model continuously. There is also another important difference: Google gets only a model update (gradients) back from its participants, while Finterai receives the entire machine learning model.

Model updates are smaller than the entire machine learning models, which thereby reduces network traffic. However, Finterai chooses to transfer entire machine learning models for technical, security and commercial reasons. Finterai also implements “secure aggregation” differently to Google. The difference is partly a function of the fact that the enterprises address different “use cases”.

Finterai aims to perform explicit tests of security threats, distortion issues and data leakage threats that may arise during the federated learning process. This provides a more robust degree of personal data protection and system protection than Google’s original model afforded. It is worth noting that such problems will arise in any situation in which machine learning models are shared or made available. In other words, these are not threats unique to federated learning.

Simplified, step-by-step presentation of Finterai’s federated learning process:

  1. A participating bank asks Finterai to build a machine learning model. The participant supplies Finterai with its own hyperparameters and other training instructions.
  2. Finterai builds a model based on the instructions received.
  3. Finterai sends this model, with the defined hyperparameters, to the first participating bank for training on its local dataset.
  4. The first participating bank receives the model, with the hyperparameters that describe the training. This training takes place locally, using standardised transaction data and other data (KYC and third-party data).
  5. Once the training has been completed locally at the participating bank, the model and hyperparameters are returned to Finterai. The model is then stored in Finterai’s database.
  6. Finterai checks the quality of the model, looking for data leakages and distortions.
  7. Finterai sends the updated model and relevant hyperparameters to the next participating bank.
  8. Participant 2 receives the model and the hyperparameters. The model is trained locally by Participant 2 on the same type of data as in step 4.
  9. Steps 5 to 9 are repeated until the model is fully trained – i.e. it has converged.
  10. Finterai stores the fully trained model on a server. All the participants in the federated learning process have access to the models, which can be downloaded from the server and used immediately on the banks’ local datasets to identify suspicious transactions.

In this model, all data storage related to these processes (including transaction monitoring) takes place at the banks. Finterai will not need access to the banks’ local data containing transaction details to develop or operate the service.

Discussions between the Norwegian Data Protection Authority and Finterai

The Norwegian Data Protection Authority and Finterai have held five workshops to discuss the technology Finterai is planning to use and challenges relating to the data protection regulations. At the first workshop, Finterai was still at the concept development stage with regard to its solution. Much of the discussion therefore concerned how Finterai could design the solution in a way that best protected data privacy. Although the Norwegian Data Protection Authority has not attempted to influence Finterai’s method, the discussions helped to highlight the consequences of the choices it makes when designing the solution.

One tangible learning outcome from the workshops is that developers can design the federated learning method in many different ways. The different designs will to varying degrees address important data protection considerations and open up for vulnerabilities. Choices that would result in the collection and centralisation of the banks’ transaction data on a central server could potentially create a large attack surface and trigger extensive demands for technical and organisational measures.

At the time of writing this final report, Finterai has elected to pursue a more decentralised solution, which minimises the system's attack surface, since different data storage systems can rarely be attacked through the same vulnerability. This will also have an impact on security threats, discussed in the chapter on security challenges. Information concerning the requirements for, and consequences of the different systems architectures has been extremely important for Finterai, since it has helped the company to make good choices at an early stage, which was otherwise characterised by much uncertainty.

The Financial Supervisory Authority of Norway's involvement in the project

This project touches on the relationship between the anti-money laundering regulations and the data protection regulations, which both safeguard important social considerations. The Financial Supervisory Authority of Norway (FSAN) is responsible for verifying that financial institutions complywith the anti-money laundering regulations. However, the FSAN has played no formal role in the Norwegian Data Protection Authority’s Finterai sandbox project.

These two regulatory frameworks are intended to safeguard principles that are to some extent contradictory, with certain unclear boundaries between customer-related measures and the data minimisation principle. This project has shown the Norwegian Data Protection Authority that it can be challenging for us as a supervisory authority to provide clear recommendations and guidance on good data protection in the effort to combat money laundering without involving the FSAN. It has therefore been natural to consult the FSAN about relevant matters relating to the interpretation and practising of the anti-money laundering regulations as the project has progressed.

The FSAN also participated in one of the sandbox project’s workshops as an observer. However, it is important to make clear that this report expresses the Norwegian Data Protection Authority’s assessments and views. The FSAN has evaluated whether the representations in the Anti-Money Laundering Act are incorrect but has taken no position on the factual descriptions or Finterai’s assessments regarding the regulations. The FSAN has not been involved in the preparation of this report.

Finterai and the FSAN have, in parallel with the sandbox project, also engaged in a dialogue on a number of issues relating to the interpretation of specific provisions in the Anti-Money Laundering Act. This dialogue has primarily aimed to clarify whether the Anti-Money Laundering Act places restrictions on the types of information that may be shared between reporting entities. The FSAN has replied to these questions in a letter sent directly to Finterai.