Logo and page links

Main menu

How to succeed with transparency

How to succeed with transparency

The use of artificial intelligence (AI) is associated with a wide range of issues and aspects that fall under the broad umbrella of transparency. As soon as AI is used in connection with personal data, transparency is required by the data protection regulations. Yet under this umbrella we also find ethical questions and technological issues relating to communication and design. We've made an experience-based report on how to communicate when using artificial intelligence (AI).

Introduction

While the regulations set out clear requirements with respect to transparency, they do not draw razor-sharp boundary lines or prescribe exactly how to be transparent. Specific assessments must be made in each individual case. This is why we devote so much space in this report to examples, because the assessments and associated initiatives in these real-life examples can provide valuable lessons for others facing similar issues. This is not a complete guide to all aspects of transparency when using AI. However, we highlight some key sandbox discussions that we believe can be of value to others.

Trust is a recurring topic in the examples. If people are to be willing to use solutions and share personal information, they must feel confident that the solution works as intended and adequately protects their privacy.

The Norwegian Data Protection Authority's 2019/2020 survey showed clear indications of a “chilling effect” on public engagement. In other words, if people are unsure about how their personal data are going to be used, they change their behaviour. Over half of those questioned have avoided using a service due to uncertainty about how personal data are collected and used. As many as two out of three respondents feel they have little control and are powerless when it comes to the flow of personal data online. It is easy to assume that this chilling effect applies not only to the internet, but that the scepticism spills over into other forms of personal data sharing, for example when confronted by AI-driven tools.

This experience-based report starts with a review of the most important statutory provisions relating to transparency when using AI. We then present three projects from the Norwegian Data Protection Authority's regulatory sandbox, where transparency has been an important topic. Finally, we have drawn up a checklist for transparency in relation to AI.

Whether you are a programmer sitting at the heart of the development process or an entrepreneur with a burgeoning idea, whether you work for a major enterprise or a small startup: we think this report could be useful for many of those who are developing, using or considering the procurement of AI-based solutions. We hope you will find it illuminating and inspiring, and that it can help you succeed with your AI endeavours.

Statutory transparency requirements

Transparency is a fundamental principle of the EU's General Data Protection Regulation (GDPR), which requires that the person the data relate to (the data subject) be notified of which personal data have been recorded and how the data are being processed. Transparency relating to the processing of personal data is a precondition for the ability of individuals to uphold their rights. Transparency can also help to uncover errors and unfair discrimination, and to engender trust.

Irrespective of whether or not personal data are processed using AI, the GDPR requires transparency and disclosure. In brief, the GDPR requires that:

  • Personal data must be processed in an open and transparent manner (see Article 5(1)(a)). This means, for example, that the data controller must ensure the data subject has sufficient information to uphold their own rights.
  • The data subject must be informed about how their personal data is being used, whether they have been obtained from the data subject themselves or from one or more third parties (see Article 13 and Article 14).
  • If the data subject has supplied the data, such information must be provided in writing before or at the same time as the data are collected (see Article 13). If the data have been obtained from other sources, the data controller must notify the data subject within a reasonable period of time that the data in question have been collected (see Article 14).
  • This information must be written in an understandable fashion, using clear and simple language (see Article 12). It must also be easily accessible to the data subject.
  • The data subject is entitled to be told that their personal data is being processed and to inspect the data concerned (see Article 15).

A comprehensive guide may be found in WP29's/EDPB's guidelines on Transparency.

Transparency requirements relating to the development and use of artificial intelligence (AI)

The use of artificial intelligence (AI) is normally divided into three main phases:

  1. Development of the algorithm
  2. Application of the algorithm
  3. Continuous machine learning and improvement of the algorithm

The GDPR's transparency requirements are general and are essentially the same for all the phases. However, there are some requirements that are relevant only in certain phases. For example, the requirement to provide information on the algorithm’s underlying logic is, as a rule, only relevant in the application phase.

In the development phase, data is processed for the purpose of developing one or more algorithms or AI models. The personal data used are generally historic data that have been collected for a purpose other than the development of an AI model. In the application phase, the AI models are used to carry out a specific task in practice. The purpose of the data processing is normally linked to the task to be performed. In the final phase, the continuous machine learning phase, the AI model is further developed and improved. In this phase, the algorithm is continuously refined on the basis of new data collected during the application phase.

In the following, we presume that in all three phases data is processed lawfully in accordance with Article 6 of the GDPR. Read more about the legal basis for data processing. We will examine in more detail the duty to provide information in the various phases.

Transparency requirements in the development phase

Articles 13 and 14 of the GDPR require undertakings to give notice when personal data are used in connection with the development of algorithms. Article 13 applies to data obtained directly from the data subject, for example by means of questionnaires or electronic tracking. Article 14 regulates situations where data are obtained from other sources or have already been collected, e.g. from one or more third parties or publicly available data.

When the data have been obtained directly from the data subject and are to be processed in connection with the development of AI systems, Article 13 requires the data controller to disclose the following:

  • The types of personal data to be processed
  • The purpose for which the algorithm is being developed
  • What will happen to the data once the development phase has finished
  • Where the data have been obtained from
  • The extent to which the AI model processes personal data and whether anonymisation measures have been implemented

In principle, the data subject will have specific rights in connection with all processing of their personal data. The most relevant right is the right to request access to and the deletion and correction of their data, and, in some cases, also to object to their processing. Large quantities of personal data are often used in the development and training of AI. It is therefore important that both the solution's development and training are assessed specifically in relation to the regulatory framework.

On the whole, the same duties set out in Article 14 apply to data that have already been collected and used for a different purpose than the development of AI systems, such as information the undertaking has recorded on its customers or users.

However, Article 14(5) contains an exemption which may be relevant for the development of AI systems. Due to the vast quantities of data that are often required for the development of AI systems, notifying all the data subjects concerned can be a resource-intensive process. For example, in research projects involving the use of register data from hundreds of thousands of people, it may be difficult to notify each person individually. It follows from Article 14(5) that an exemption may be made if the data subject already has the information, the provision of this information proves impossible or would involve a disproportionate effort, the collection or disclosure is expressly permitted under EU law or the member states’ national legislation, or if the personal data must remain confidential under a duty of professional secrecy.

What constitutes a disproportionate effort will always rest on discretionary judgement and an overarching assessment of the specific circumstances. The Norwegian Data Protection Authority recommends that a minimum of information be provided in all cases, so the individual data subject knows in advance whether their personal data are being used for the development of AI. This may be ensured by means of the publication of general information concerning the data processing, e.g. on the undertaking's website. The information must be accessible to data subjects before further data processing commences.

Transparency requirements in the application phase

In the application phase, disclosure requirements will depend on whether the AI model is used for decision-support or to produce automated decisions.

For automated decisions which have a legal effect or significantly affect a person, specific disclosure requirements apply. If processing can be categorised as automated decision-making pursuant to Article 22, there are additional requirements for transparency. (See also Article 13(2)(f) and Article 14(2)(g).) The data subject is entitled to:

  • Information that they are the subject of an automated decision.
  • Information about their right not to be the subject of an automated decision pursuant to Article 22.
  • Meaningful information about the AI system's underlying logic.
  • The significance and expected consequences of being subject to an automated decision.

Although the provision of such supplementary information to the data subject is not expressly required when the AI system is being used as a decision-support tool, the Norwegian Data Protection Authority recommends that it be provided in such cases. This is particularly true where “meaningful information about the AI system’s underlying logic” can help the data subject to better uphold their rights.

A meaningful explanation will depend not only on technical and legal requirements, but also on linguistic and design-related considerations. An assessment must also be made of the target group for the explanation concerned. This could result in different wording for professional users (such as the NAV advisers and teachers referred to in the following examples) and more sporadic users (consumers, children, elderly people).

These EU guidelines provide advice on what a meaningful explanation of the logic could contain.

The data controller must assess how detailed to make the explanation of how the algorithm works, while ensuring that the information is clear and understandable for the data subjects. This may be achieved by including information about:

  • The categories of data that have been or will be used in the profiling or decision-making process.
  • Why these categories are considered relevant.
  • How a profile used in the automated decision-making process is constructed, including any statistics used in the analysis.
  • Why this profile is relevant for the automated decision-making process.
  • How it is used to make a decision that concerns the data subject.

It may also be useful to consider visualisation and interactive techniques to assist with algorithmic transparency.

Public sector undertakings may be subject to other requirements relating to the provision of information concerning the reasons for automated decisions, e.g. the Norwegian Public Information Act or sector-related legislation.

In those cases where the data subject is entitled to object under Article 21, they must be made explicitly aware of their right to object pursuant to Article 21(4). The data controller is responsible for ensuring that this information is provided clearly and separately from other information, and that it is easily accessible – both physically and in the way it is framed. While it is natural to include such information in a privacy policy, this alone would probably not be sufficient to fulfil this requirement. The data subject should, in addition, be notified of their right to object in the interface where the processing of their data is initiated. In the event of an application portal, for example, the information should be clearly visible on the website or in the app into which the personal data is entered.

In connection with the use of personal data collected in the application phase for continuous machine learning, the requirement to provide information will largely coincide with the requirements in the development phase.

Transparency when using AI in schools

The Aktivitetsdata for vurdering og tilpassing (AVT) project is a research and development project focused on the use of digital learning analytics in schools. The project explores the use of learning analytics and artificial intelligence (AI) to analyse pupil activity data from various digital learning tools.

Activity data is the term used for the data that are generated when a pupil completes activities in a digital learning tool. Such data could comprise information about which activity the pupil completed, how long they spent working on it, and whether or not they answered correctly.

The purpose of the project is to develop a solution that can help teachers to adapt their teaching to the individual pupil. For example, when maths teacher Magnus starts preparing his class for the exam, the system will come up with a revision proposal based on the work the pupils have done recently. Maybe the AI will suggest more algebra for Alfred and more trigonometry for Tina because this is where it has identified the largest gaps in their knowledge?

In addition to individually adapted teaching, the project’s purpose is to give pupils greater insight into their own learning and support teachers in their pupil assessments. The goal of adapted teaching is to ensure that the pupils achieve the best possible learning outcome from their education. On a more general level, the AVT project aims to drive the development of national guidelines, norms and infrastructure for the use of AI in the teaching process.

Specifically, the AVT project uses an open learner model as well as analytics and recommendation algorithms to analyse learning progress and make recommendations for pupils. Analysis results are presented in an online portal (dashboard) customised for each user group – such as teachers, pupils and parents. Users log in to the portal via Feide.

The project owner for the AVT2 project is the Norwegian Association of Local and Regional Authorities (KS). The project is led by the University of Bergen (UiB) and its Centre for the Science of Learning & Technology (SLATE). The City of Oslo’s Education Agency has been the project's main partner and driving force since it commenced in 2017. Recently, the Municipality of Bærum and the regional inter-municipal partnership Inn-Trøndelag have also joined the project in smaller roles.

The sandbox discussed three aspects of transparency:

  • User involvement to understand the risk and the types of information the user needs
  • How to provide information tailored to the users
  • Whether it is necessary to explain the algorithm's underlying logic

User involvement to understand risk and information needs

The AVT project invited pupils, parents/guardians, teachers and municipal data protection officers to participate in a project workshop to discuss privacy risks relating to the use of learning analytics. Understanding the risks to users posed by the system is important if relevant and adequate information is to be provided. Transparency about the use of personal data is not simply a regulatory requirement to enable the individual to have as much control as possible over their own data. Transparency about the use of data can also be important to reveal errors and distortions in the system.

The workshop participants were given a presentation on the learning analytics system, which was followed by discussions in smaller groups: one group comprising children and adults, and one group comprising only adults. The groups were tasked with identifying risks to the pupils’ privacy resulting from use of learning analytics. Below, we have summarised the discussions with respect to three types of risks.

Risk of altered behaviour/chilling effect

When pupils work with digital learning tools, potentially detailed data may be captured and stored. For example, how long a pupil spends on a task, the time of day they do their homework, improvement in performance over time, etc. Keeping track of what data have been registered about them and how these data are used can be challenging for pupils.

The pupils who participated in the workshop were especially worried about the system monitoring how long it took them to complete a task. They pointed out that if the time they spent working on a problem was recorded, they could feel pressured into solving the problems as quickly as possible, at the expense of quality and learning outcome. A chilling effect may arise if pupils change their behaviour when they are working with digital learning tools because they feel the learning analytics system is tracking them. In other words, they change their behaviour because they do not know how their data may be used.

Another example of a chilling effect mentioned in the discussions was that pupils may not feel as free to experiment in their problem-solving, because everything they do in the digital learning tools is recorded and may potentially affect the profile built by the learning analytics system.

If the introduction of an AI-based learning analytics system in education leads to a chilling effect, the AI tool may be counterproductive. Instead of the learning analytics system helping to provide each individual pupil with an education adapted to their needs, the individual pupil adapts their scholastic behaviour to the system.

Adequate information about the type of information collected and how it is used (including which information is not collected and used) is important in order to give the user a sense of assurance and control. It can also help to counteract unintended consequences, such as pupils potentially changing their behaviour unnecessarily. 

Risk of incorrect personal data in the system

A fundamental principle in the data protection regulations is that any personal data processed must be correct. Incorrect or inaccurate data in a learning analytics tool could have a direct impact on the individual pupil’s profile. This could, in turn, affect the teacher’s assessment of the pupil’s competence and the learning resources recommended for the pupil.

The learning analytics system collects data on the pupils’ activities from the digital learning tools used by the school. One potential source of incorrect data, which was discussed by the adult participants in the workshop, is when a pupil solves problems on someone else’s behalf. This has probably always been a risk in education, and there is no reason to believe that a transition to digital activities has changed anything in this regard.

However, the impact on the individual pupil may be far greater now, if the data from this problem-solving is included in an AI-based profile of the pupil. For example, the system may be tricked into believing that the pupil is performing at a higher level than they actually are, thus recommending problems the pupil does not yet have the skills to solve. This could have a demotivating effect on the pupil and reinforce their experience of being unable to master a subject or topic.

A similar source of incorrect data is when a pupil deliberately gives the wrong answer in order to manipulate the system into giving them easier or fewer tasks. This, too, is a familiar strategy, used by children since long before the digitalisation of education. What both of these examples have in common is that the problems must be addressed both technologically and by raising awareness in general.

Risk of the technology causing the pupils unwanted stress

Another issue that came up in the workshop was that, for the pupils, use of the learning analytics system risks blurring the line between an ordinary learning situation and a test. Teachers already use information from the pupils’ problem-solving and participation in class as a basis for assessing what pupils have learned. By using a learning analytics system, however, this assessment will be systematised and visualised differently to the present system. Pupils expressed concerns that there would be an expectation to show their “score” in the system to peers and parents, in the same way as pupils currently feel pressure to share test results.

Measures to reduce this risk can be designed into the system in a way that emphasises or visualises a “score” or results in a balanced manner. Adequate information about the type of information used for assessment (and what is not used) could also be a means of reducing the uncertainty and stress pupils experience when being assessed in the learning situation. 

How to provide information tailored to the users

It can sometimes be challenging to provide a clear and concise explanation of how an AI-based system processes personal data. For the AVT project, this situation is further complicated by the age range of its users. This system may potentially be used by children as young as six at one end of the range and by graduating pupils in upper secondary school at the other.

One central discussion in the sandbox was how the AVT project can provide information that is simple enough for the youngest pupils, while also meeting the information needs of older pupils and parents. These sandbox discussions can be summarised as follows:

  • Use language that takes into account the youngest pupils – adults also appreciate information that is simple and easy to understand.
  • Include all of the information required by law, but not necessarily in the same place at the same time. Adults and children alike can lose heart if the document or online article is too long. One guiding principle may be to focus not only on what the pupils/parents need to know, but also when they need this information.
  • It could be beneficial to provide information in layers, where the most basic information is presented first, while at the same time giving the reader an opportunity to read more detailed information on the various topics. Care must be taken to ensure that important information is not “hidden away” if this approach is used.
  • Consider whether it would be appropriate to provide (or repeat) information when the pupils are in a setting where the information in question is relevant, e.g. by means of pop-up windows.
  • Use different approaches – what works for one group may not necessarily work for another. The AVT project included text, video and images in its information materials, and feedback from data subjects indicates that different user groups respond differently to different formats.
  • Be patient and do not underestimate the complexity of the topic or how difficult it can be to understand how the learning analytics system works, as well as the purpose and consequences of implementing this type of system. This applies to both children and adults.

Explaining the system’s underlying logic

The AVT project’s learning analytics system is a decision-support system. This means that the system produces proposals and recommendations, but does not make autonomous decisions on behalf of the teacher or pupil. If the system had taken automated decisions, it would have been covered by Article 22 of the GDPR, which requires relevant information to be provided about the system’s underlying logic. Whether information about the logic must be provided if there is no automatic decision-making or profiling must be considered from case to case, based on whether it is necessary for the purpose of securing fair and transparent processing.

In the sandbox, we came to no conclusions about whether the AVT project is legally obligated to provide information about the underlying logic of the learning analytics system. We did, however, discuss the issue in light of the objective of the sandbox, which is to promote the development of ethical and responsible artificial intelligence. In this context, we discussed how explanations that provide users with increased insight into how the system works could increase trust in the system, promote its proper use and uncover potential defects.

But how detailed should an explanation of the system be? Is it sufficient to simply provide a general explanation for how the system processes personal data in order to produce a result, or should a reason for every single recommendation made by the system also be provided? And how does one provide the youngest pupils with a meaningful explanation? This sandbox project did not offer any final or exhaustive conclusions on these issues, but we did discuss benefits, drawbacks and various alternative solutions.

For the youngest pupils, creativity is a must when it comes to explanations, and these explanations do not necessarily need to be text-based. For example, the AVT project created an information video, which was presented to various stakeholders. The video was well-liked by the children, but garnered mixed reviews from the adults.

The children thought it explained the system in a straightforward way, but the adults found it did not include enough information. This illustrates firstly, how different the needs of different people are, and secondly, how difficult it can be to find the right level and quantity of information. The AVT project has also considered building a “dummy” version of the learning analytics system, which allows users to experiment with different variables. In this way, users can see how information fed into the system affects the recommendations it makes. Visualisation is often quite effective at explaining advanced technology in a straightforward manner. One could have different user interfaces for different target groups, such as one user interface for the youngest pupils and another aimed at older pupils and parents.

A privacy policy is a useful way of providing general information about how the system processes personal data. We also discussed whether individual reasons for the system’s recommendations should be provided. In other words, information about how the system has arrived at the specific recommendation and the data this recommendation is based on. Individual reasons could be made easily accessible by users, but do not necessarily have to be presented alongside the recommendation. There are many benefits to providing reasons for the system’s recommendations. Giving pupils and teachers a broader understanding of how the system works could increase their trust in it. The reasons could also make teachers better able to truly assess the recommendations made by the system, thus mitigating the risk of it being used as an automated decision-making system. (In other words, teachers blindly trusting the system instead of using it to support their own decision.) In addition, examining the reasons given can also help users to uncover errors and distortions in the system, thereby contributing to its improvement.

Transparency when using AI in the workplace

Secure Practice is a Norwegian technology company that focuses on the human aspect of data security. In the sandbox, we took a closer look at a new service that Secure Practice is developing: the use of artificial intelligence (AI) to provide individually tailored security training to employees.

Starting with what interests and knowledge each employee has about data security enables training to be more targeted and pedagogical, and therefore more effective. This tool will also provide companies with reports containing aggregated statistics on employees’ knowledge and level of interest in data security.

In order to provide personalised training, Secure Practice will collect and collate relevant data on the client’s employees. The profiling will place each end user in one of several “risk categories”, which will determine what training they will receive going forward. Risk will be recalculated continuously and automatically so that employees can be moved to a new category when the underlying data so indicates.

Employee profiling can be challenging because of the asymmetric balance of power between employer and employee. Profiling can quickly be perceived as an infringement of the individual's personal privacy. In addition to examining the data flow and ensuring that the employer does not gain access to detailed information about the individual, transparency regarding the use of data was a key topic in this project.

Must employees be informed about the algorithm’s underlying logic?

The tool at the centre of this sandbox project falls outside the scope of the GDPR's Article 22, since it does not produce an automated decision that has legal repercussions for the employees. Accordingly, no duty to explain how the algorithm functions follows directly from this provision.

The project assessed whether the principle of transparency, read in light of GDPR Recital 60, could imply a legal duty to disclose how the algorithm functions. According to the GDPR’s Article 5(1)(a), the data controller must ensure that personal data are processed fairly and transparently. Recital 60 highlights that the principle of transparent processing requires that the data subject be informed of the existence of profiling and the consequences thereof.

GDPR Recital 60:

“The principles of fair and transparent processing require that the data subject be informed of the existence of the processing operation and its purposes. The controller should provide the data subject with any further information necessary to ensure fair and transparent processing taking into account the specific circumstances and context in which the personal data are processed. Furthermore, the data subject should be informed of the existence of profiling and the consequences of such profiling. Where the personal data are collected from the data subject, the data subject should also be informed whether he or she is obliged to provide the personal data and of the consequences, where he or she does not provide such data. That information may be provided in combination with standardised icons in order to give in an easily visible, intelligible and clearly legible manner, a meaningful overview of the intended processing. Where the icons are presented electronically, they should be machine-readable.”

The European Data Protection Board highlights the importance of disclosing what consequences the processing of personal data will have and ensuring that it should not come as a surprise to those whose personal data are processed.

Although the use of personal data in this project does not trigger a legal obligation to explain the system’s underlying logic, it is good practice to act in such a transparent fashion that the user is able to understand how their data are used. Transparency about how Secure Practice’s tool works can help to engender trust in the AI system.

User involvement – how to build trust through transparency

What information is it relevant to give employees who will make use of the tool, and when should that information be provided? Two focus groups were established to examine these questions. One focus group comprised employees of a major Norwegian enterprise, while the other comprised representatives from a trade union organisation.

One of the questions discussed in the focus groups was whether the employee should be given an explanation of why the algorithm presents the individual user with precisely this proposal. For example, why is an employee encouraged to complete a specific learning module (“because we see that you did not do so last week”) or take a quiz on cyberthreats (“because we see you have completed the module and it can be a good idea to check what you remember”).

A specific example could be an employee receiving a suggestion to complete a certain type of training because they had been caught out by a phishing exercise. The focus group discussed whether such detailed information might make the user feel they are being monitored, which could in turn lead to a loss of trust. The arguments indicated otherwise: most agreed that providing this type of information was a good idea because it would help the employee to understand how the information was used and because such transparency could engender trust in the solutions.

Focus group members also discussed which types of data were relevant to include in such a solution, and which data should not be included. The system can, potentially, analyse everything from which learning modules have been completed and the results of post-module quizzes, to how the employee deals with suspicious emails. It can also analyse more sensitive information from personality tests.

The focus group made up of employees from a major Norwegian enterprise was, in principle, prepared to use the solution and share fairly detailed data, provided that it made a constructive contribution to achieving the goal of better data security in the company. It emerged that they trusted their employer to safeguard their privacy and not to use their data for new purposes that could have a negative impact on them.

The focus group made up of trade union representatives emphasised the risk of the individual employees’ answers being traced back to them, and of the employer being able to use the data obtained through such a tool for new purposes. They were, for example, concerned that the employee could be penalised for a low score, either through the loss of a pay rise or other opportunities within the organisation. They pointed out that transparency is a precondition for employees being able to trust the solution.

The focus group participants emphasised the importance of clear and concise communication with the employees. Uncertainty surrounding how the data will be used increases the risk of the employees adapting their answers to what they believe is “correct” or being unwilling to share data. This is an interesting finding because the algorithm becomes less accurate if the data it is based on are inaccurate and do not represent the user's actual situation.

The focus group comprising trade union representatives felt it was important to clarify early in the process how the data would be stored and used in the company's work. They further argued that the contract between Secure Practice and the company be framed in such a way as to protect employee privacy, and that it was important to involve the employees or their trade union representatives at an early stage in the procurement process. In their opinion, such a solution could be perceived differently by employees, depending on the situation. The extent to which employees trusted their employer could, for example, have an impact.

To minimise this risk, the focus group warned against formulating questions in such a way that the answers could harm the employees if their employer became aware of them. They called for the omission of questions about whether the employee had committed any security breaches, and for communication with the individual to be framed in a positive way, so that the user felt supported and guided, not profiled and criticised.

Secure Practice used the insights provided by this user involvement exercise to adjust the way the solution provides information to the end user. In addition to smart initiatives regarding transparency, the sandbox project gave Secure Practice input on how it can protect the individual employee’s statutory rights by ensuring that personal data from the solution cannot be used for new purposes. Read the sandbox report for further details.