Logo and page links

Main menu

Finterai, exit report: Machine learning without data sharing

Data minimisation

The discussions in this chapter relate to how Finterai can facilitate data minimisation in its service.

The development of artificial intelligence (AI) often depends on vast quantities of personal data. However, the Data Minimisation Principle requires that the data to be used are adequate, relevant and limited to those necessary to achieve the purpose for which they are being processed. This means that a data controller cannot use more personal data than are actually necessary to achieve the purpose, and that the data must be deleted when they are no longer needed. Furthermore, the data minimisation principle means that the data selected must be relevant for the purpose.

Read more about data minimisation

In the Norwegian legal commentary to the GDPR (Skullerud et al.), it is pointed out that the requirements for adequacy and relevance mean that the personal data processed must “have a close and natural link to the purpose for processing and be suitable for that purpose”. The data minimisation assessment is irrevocably tied to the purpose for processing.

Read the legal commentary on the GDPR (juridika.no)

The data controller is responsible for upholding the data minimisation principle. A software supplier which, after a specific assessment, is not deemed to be the data controller, will in principle not have any direct responsibility for upholding the data minimisation principle. Nevertheless, it is important that the software delivered makes it possible for the data controller to comply with the regulations in practice. In the opposite event, the supplier's customers will not legally be able to use the software for the processing of personal data. It is therefore important that Finterai takes a conscious approach to data minimisation in the development of its service, regardless of whether or not the company is deemed to be the data controller.

In the sandbox, we discussed how Finterai’s service could affect the volume of personal data that the banks use in their effort to uncover suspicious transactions, and what Finterai could potentially do to facilitate data minimisation. The discussions focused primarily on the collection of third-party data that do not come directly from the transactions but that are collected from parties other than the customers themselves. Such data could, for example, have been published in the media. Third-party data does not always contain personal data. The assessments in this report relate solely to the processing of third-party data considered to constitute personal data.

Relationship with the anti-money laundering regulations – challenges for standardisation

We have not assessed legal basis in this sandbox project. The banks' obligations to assist in the fight against money laundering are, however, set out in the Anti-Money Laundering Act and associated statutory regulations. It would therefore be understandable to assume that the banks, if they want to process third-party data for the purpose of uncovering money laundering, must find a legal basis for such processing in the anti-money laundering regulations.

The anti-money laundering regulations are risk based. This means that each bank’s obligation to collect information pursuant to this regulatory framework depends on the risk the individual customer represents within the bank in question. The same customer may constitute a different risk for one or more different banks. In addition, each bank’s customer base will be made up of customers to which different levels of risk are attached.

This risk-based approach in the anti-money laundering regulations could create problems for Finterai's desire to standardise the data categories that the banks must use in the federated learning process. The data minimisation principle means that the banks cannot process more personal data than are necessary to fulfil the purpose. One question that emerged in the sandbox was whether it is possible, within the limits of the prevailing anti-money laundering regulations, to find a minimum level of data that may always be collected, irrespective of risk, and that may therefore be included in a standardisation process.

It is the Financial Supervisory Authority of Norway (FSAN) which monitors the reporting entities’ compliance with the anti-money laundering regulations, and an interpretation of this regulatory framework falls outside the remit of the Norwegian Data Protection Authority's sandbox process. We are therefore unable to answer this question. However, all processing of personal data requires a legal basis, and the discussion below therefore presumes that it is possible to establish a minimum level of data that may always be used in connection with anti-corruption activities, regardless of risk.

Data minimisation and federated learning – the need for predefined data categories

Some banks already collect third-party data in connection with their anti-money laundering endeavours. However, the banks differ in the kind of data they collect. In order for federated learning to work as intended, it is necessary to coordinate which data categories the banks process. This is to enable a model developed in Bank A to be trained in Bank B and Bank C. These subsequent banks must have access to the same data categories as Bank A used when the model was developed.

The banks which participate in federated learning must therefore have access to the same categories of personal data. However, the need for each category of personal data only arises when a bank builds a model that uses the personal data concerned. Nevertheless, it may be assumed that some types of data, for example the data contained in SWIFT messages, will always be relevant. Other categories of personal data are used more rarely or potentially not at all. At this point, the issue of data minimisation arises. Is it in line with the data minimisation principle to collect personal data without knowing, at the time they are collected, whether they will be needed? This issue will probably also be relevant to a greater or lesser degree for other entities that use federated learning on the basis of personal data.

In the sandbox, we discussed different ways of adapting Finterai’s service to enable the banks to avoid collecting the various personal data categories until there is an actual need for them.

The most realistic alternative discussed was for the banks to collect necessary third-party data only once they have decided to develop a model that includes those data, or when they receive a model for training that requires the particular data concerned. Finterai has pointed out that such a solution could be technically challenging, and that it will lead to delays in the training process.

If it turns out that the banks have collected personal data that there is never any need for, the data concerned cannot be said to be necessary for the specific purpose. The Norwegian Data Protection Authority therefore proposes that the system be rigged in such a way that the banks can hold off from obtaining personal data until they know for sure that they will have a use for them. In this context, it is important to underline that the Norwegian Data Protection Authority’s contributions must be considered as guidance and do not constitute an assessment of the legality of Finterai’s planned service.

Data minimisation in artificial intelligence (AI)

According to Finterai, the models the banks use today to uncover suspicious transactions are too weak. The company has a theory that the banks need more datapoints in their models to do a satisfactory job of uncovering suspicious transactions, something it wants to facilitate in its service.

Use of artificial intelligence (AI) enables systems to be built that can learn, find connections, conduct probability analyses and draw conclusions far beyond the capacity of both humans and systems that do not use AI. This means that AI-based systems could increase the quality of the banks’ anti-money laundering efforts. These systems will probably find connections in data that have not traditionally been used in anti-money laundering endeavours and that are not initially considered to have a close and natural link to the fight against money laundering.

However, the banks do not always know the extent to which various third-party data will help to uncover attempts to launder money (the purpose of the processing) until they have tested the data over time. This could prove a challenge. If, after testing, it should prove that one or more categories of personal data are of little or no significance for the achievement of the purpose, those data will not meet the requirement for relevance. In that case, continued processing of those items of personal data would quickly contravene the data minimisation principle.

But what about that processing of the personal data concerned which took place up until the point at which the bank (or Finterai) discovers that they are not sufficiently relevant to achieve the purpose? Would that also contravene the data minimisation principle? These are questions we have discussed in the sandbox but have not found clear answers to. As with much else, the answers will depend on a specific assessment.

However, there are certainly no grounds to say that the processing of personal data which subsequently prove insufficiently relevant to achieve the purpose is always in breach of the data minimisation principle. In any assessment of this, it is relevant to look at the reason why such items of personal data were selected in the first place. For example, were the items of personal data selected at random, or was the selection based on relevant and legitimate assumptions?

Furthermore, it is important to be aware of the risk that an assumption may be wrong and to have effective measures in place to verify the relevance of the personal data being used. The longer it takes before picking up on and halting the processing of personal data that prove to be insufficiently relevant, the greater the risk that the processing contravenes the data minimisation principle. These issues are not unique to Finterai. They are something that everyone using AI-based tools to process personal data should pay particular attention to.