Finterai, exit report: Machine learning without data sharing

Last changed: 12/6/2022

Security challenges

No specific security-related assessments were made in the sandbox of Finterai's solution. However, we have identified what we consider to be the most important overarching threats and opportunities for Finterai's solution.

Use of federated learning has both strengths and weaknesses when it comes to information security and the protection of personal data. One of the most important strengths of this technology is that federated learning does not require the sharing or aggregation of data, including personal data, across multiple entities.

At the same time, it requires that the outcome of the training, i.e. the actual machine learning model that includes the parameter sets, be shared between the entities to create a joint model. Both the sub-models and the joint model could hypothetically be subjected to “model inversion attacks”, in which the original data – including personal data – could potentially be reconstructed.

Solutions architecture

Irrespective of the properties of federated learning, specific choices relating to solutions architecture and solutions design will naturally have an impact on its vulnerability surface. No specific assessments of Finterai’s choices with respect to its solution have been made. We merely present an overarching description of the issues discussed in the project.

Machine learning often presumes large volumes of data, generally combined with specialised software and hardware, and is frequently achieved through the use of cloud services. The use of cloud services in general has not been assessed in the project. The use of such a relatively new method and technology as federated learning creates both challenges and opportunities.

Challenges precisely because the method is new and all the potential vulnerabilities of the algorithms, procedures, tools and services may not yet have been adequately identified. Opportunities because federated learning is well suited to mitigating the classic security challenges, especially by reducing the need to transfer, share and aggregate large volumes of data. Where cloud services used for AI often rely on the uploading and aggregation of data for central processing, federated learning enables decentralised and localised data processing.

Bad actors

Finterai is a start-up company with limited resources to deal with external cybersecurity threats from bad actors, despite the solution and its tools being realised in a modern cloud solution. Cloud solutions offer many potential security functions, but this requires competence and resources for their day-to-day operation, in addition to the operation of the core solution. One important measure for reducing general cybersecurity threats that we discussed in the sandbox was to minimise the volume of assets that need to be protected.

In Finterai’s case, this is achieved by not uploading and aggregating data. Each participating bank processes its own data in its own systems and protects them through its own requirements, resources and capabilities. Only the fully trained sub-model packages are uploaded to the central part of the solution for centralised processing, coordination and verification.

The sub-models are protected via encryption (confidentiality) and signatures (integrity) during transfer and storage. The practical execution and efficacy of encryption depends on the measures and architecture that Finterai chooses to apply. During the actual training process, the sub-models will be decrypted. At this point, participants are able to use their own requirements, resources and capabilities to achieve a desired and suitable level of security. Specific methods and tools for this have not been assessed.

Internal bad actors generally take the form of disloyal employees, who may have privileged access to internal processing systems. In this case, Finterai’s solution has the same challenges as other information systems, with requirements for access control and authorisation of users. Because of federated learning's decentralised nature, disloyal employees of one participant are restricted to accessing their own data and not other participants’ datasets. Disloyal employees of Finterai itself do not, in principle, have access to customers’ datasets.

The Norwegian Data Protection Authority’s guide Software development with built-in data protection contains general guidelines for risk assessment relating to information security as part of the solution's design.

Accessibility

The solution's accessibility is safeguarded partly by its inherently decentralised nature. Local model training may, in principle, be carried out independently of the central service. The same applies to the production phase, in which the banks use the solution for its primary purpose – to identify potential money laundering transactions. All these activities take place using Finterai's tools but are run on the individual client’s own platform/infrastructure.

Further development, follow-up learning, and the sharing of learning outcomes presume access to the central services. These central services are only to a limited extent important for the system's day-to-day operation and operational accessibility because they are not part of the primary service production.

Attacks on machine learning models

All machine learning models that are trained on a dataset, including those which do not use federated learning, may be subjected to attack. One objective for such an attack may be to reconstruct data used for the model's training, including personal data. According to academic literature about federated learning and security challenges, model inversion is considered to be a particularly relevant risk, due to the systematic sharing of machine learning models between multiple entities. Federated learning may be particularly vulnerable to attacks that threaten the model's robustness, or the privacy of those whose personal data is stored by the banks.

Read the research article "Privacy considerations in machine learning"

Attacks on the model may occur during two phases: the training phase or the operational phase.

Training phase: Attacks in this phase may teach, influence or corrupt the model. The attacker may also attempt to influence the integrity of the data used to train the model. During this phase, an attacker can more easily reconstruct local data held by the various participants.
Operational phase: Attacks in this phase do not aim to alter the actual model but to alter the model’s predictions/analyses, or to gather information on weighting factors in the model. If an attacker learns about the model's weighting factors, it is hypothetically possible that this may be used to reconstruct (completely or in part) the local data (which may contain personal data) on which the model is based.

The research literature describes several measures to prevent such attacks. The most important measure is to use models and algorithms that are presumed to be robust against attack. Other potential measures include use of methods such as Differential Privacy, homomorphic encryption or Secure Multiparty Computation. No specific assessment of individual measures was undertaken as part of this sandbox project. Some of the existing security measures have the disadvantage of introducing noise to the model, have not been extensively tested in practice and can therefore reduce the model's precision or burden the system with high computation costs.

Read the research article “A Survey on Differentially Private Machine Learning”

Use of model inversion depends on a number of conditions being in place. Firstly, (especially for external actors) access to trained model packages that are, in principle, protected through encryption, access control, etc. Next, the actual execution of a model inversion attack requires both extensive and specialised skills. The data to be reconstructed may comprise just a part of the original dataset and may be of variable quality.

In addition, this type of attack depends to a large extent on the target algorithms being vulnerable to such attacks. The algorithms that Finterai could so far envisage using in its federated learning system are not considered to be particularly vulnerable to model inversion, since the basic mathematical principles underpinning the model are presumed not to allow such attacks. The algorithms used will not be openly accessible to external actors. Internal actors (typically participating banks) who systematically gain access to each other’s unencrypted training models will also, in principle, have reciprocal contractual obligations.

All in all, therefore, there are significant obstacles and costs to this type of attack for external bad actors, also in the context of Finterai’s solution. For its part, Finterai asserts that its federated learning system is no more vulnerable to attacks on personal data than machine learning models generally are. Federated learning is, nevertheless, a young technology and there could be further vulnerabilities that have not yet come to light. This lack of knowledge makes performing an accurate risk assessment challenging.

Previous page Next page

9. Security challenges

Finterai, exit report: Machine learning without data sharing

Table of contents

Security challenges

Solutions architecture

Bad actors

Accessibility

Attacks on machine learning models

Table of contents

Finterai, exit report: Machine learning without data sharing

Table of contents

Security challenges

Solutions architecture

Bad actors

Accessibility

Attacks on machine learning models

Veileder navigasjon

Table of contents