Equipo Nizkor - Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems

Reviewable Automated Decision-Making:
Framework for Accountable Algorithmic Systems

Jennifer Cobbe, Michelle Seng Ah Lee, Jatinder Singh
Compliant and Accountable Systems Research Group,
University of Cambridge, UK

ABSTRACT

This paper introduces reviewability as a framework for improving the accountability of automated and algorithmic decision-making (ADM) involving machine learning. We draw on an understanding of ADM as a sociotechnical process involving both human and technical elements, beginning before a decision is made and extending beyond the decision itself. While explanations and other model-centric mechanisms may assist some accountability concerns, they often provide insufficient information of these broader ADM processes for regulatory oversight and assessments of legal compliance. Reviewability involves breaking down the ADM process into technical and organisational elements to provide a systematic framework for determining the contextually appropriate record-keeping mechanisms to facilítate meaningful review - both of individual decisions and of the process as a whole. We argue that a reviewability framework, drawing on administrative law’s approach to reviewing human decision-making, offers a practical way forward towards more a more holistic and legally-relevant form of accountability for ADM.

KEYWORDS

Algorithmic systems, automated decision-making, accountability, audit, artificial intelligence, machine learning

1. INTRODUCTION

Recent years have seen growing calls from governments, regulators, and civil society for automated and algorithmic decision-making (ADM) to be more transparent and accountable (e.g.[17, 20, 34, 39, 50, 52]). Many of these have related not only to information about the algorithms themselves, but also about their development, deployment, and use. Understanding how ADM is commissioned, developed, and operated will be increasingly important as it plays a greater role in society, used by governments, companies, and other organisations alike. Indeed, ADM is already used to make decisions about welfare and housing [1], access to credit and financial services [38], and immigration and employment [41], and is likely to become increasingly widespread. Information on ADM’s development and use is crucial for assessing whether decisions have been made lawfully, for ensuring that systems operate properly, and for informing those seeking to protect people from arbitrary or capricious interventions in their lives.

A particular concern is ADM involving machine learning (ML). Meaningful accountability can be difficult with ML; understanding how complex ML systems work can be challenging, and commercial considerations can conceal broader organisational processes [10]. To address this, much research has focused on mechanisms to make models and other technical aspects of ADM interpretable by or explainable to humans in some way (e.g. see [4,32]). However, information about models and their workings may not be suitable or relevant for various kinds of accountability, and broader technical and organisational factors are generally underconsidered.

This paper introduces the concept of reviewability as an ap-proach to improving the accountability of ADM involving ML (though it is relevant for algorithmic systems in general). This draws on our previous work on applying English administrative law-governing public sector decision-making—to ADM [13,14]. Administrative law has developed iteratively over decades through judicial review by senior courts, applying to even the most conse-quential decisions of life and death. Administrative law as a framework and judicial review as an oversight mechanism understand decision-making as a process beginning before decisions are made and extending to their consequences and effects. Our previous work identified points in the ADM process where administrative law’s principles and requirements could be applied, enabling effective judicial review of that process as a whole. As a common law jurisdiction with considerable influence on others and which reflects legal frameworks across democratic countries [51], and drawing on our previous work, we develop reviewability from English administrative law. As such, references to ‘administrative law’ herein mean English administrative law, unless otherwise stated.

Others have also applied concepts from administrative law (English or otherwise) to ADM in various ways [11,15,46,55,70,78]. However, these have generally not viewed ADM as a broad process, or, as with our previous work, sought to apply legal standards to public sector ADM. We believe that administrative law’s under-standing of decision-making as a process is also highly relevant to accountability of ADM and algorithmic systems more generally in private and public sectors. We do not argue that public sector standards for decision-making should apply to the private sector, but that a similarly broad approach applied to algorithmic systems— considering the whole decision-making process to identify points of review and potential intervention—would benefit accountability of ADM in any sector. We thus build on our work in this area to move beyond a narrower public sector focus and develop a systematic, holistic framework for reviewable ADM as a process applicable across sectors.

1.1 A systematic framework for holistic review

We set out a holistic understanding of ADM as a broad socio-technical process, involving both human and technical elements, beginning with the conception of the system and extending through to use, consequences, and investigation. As ADM often involves multiple people working at different stages of the process (together or separately), we consider these human elements as being organisational in nature. That is, the human elements of ADM—actions and decisions of people—generally take place in the context of, are informed by, and constitute organisational systems and processes. We note that ADM itself—even understood as a process—operates not on its own, but as part of a wider sociotechnical system.

Cite as: Jennifer Cobbe, Michelle Seng Ah Lee, Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. In ACM Conferenceon Fairness, Accountability, and Transparency(FAccT 21), March 1-10,2021, Virtual Event, Canada. ACM, New York, NY, USA. https://doi.org/10.1145/3442188.3445921

Breaking down this ADM process into its technical and organisational elements allows us to systematically consider how contextually appropriate record-keeping, logging, and other documentary mechanisms at each stage of the process can allow for the process as a whole to be reviewed. This assists with understanding how and why systems and processes are functioning and how particular decisions are made, offering opportunities to identify points of intervention and control across the ADM process (such as interventions by organisations at the point of decision-making to manage various risks or external interventions by or at the behest of regulators and others). We focus on ADM involving ML, but the approach can potentially apply across algorithmic systems.

As such, a reviewability framework potentially offers a useful and holistic form of transparency to support meaningful accountability for ADM. And, while producing reviewable processes is already achievable, there are also numerous research opportunities in this space. We therefore integrate related research into a reviewability framework and highlight areas requiring further attention.

In doing so, we make three key contributions: (i) drawing from the approach to accountable decision-making of well-established legal frameworks to consider ADM as a broad socio-technical process; (ii) taking a holistic view of that process from system conception through to consequences and investigation, and (iii) providing a systematic framework for determining the technical and organisa-tional record-keeping and logging mechanisms for that process, to provide contextually appropriate information supporting meaning-ful accountability for algorithmic systems.

First (§2), we discuss approaches to ADM accountability that focus narrowly on models (§2.2), arguing that ADM should instead be understood as a broader socio-technical process to support different forms of accountability (§2.3). Next (§3), we elaborate on our concept of reviewability as a means of achieving this, describing its origins in our work on applying administrative law to public sector ADM (§3.1) and setting out the concept of reviewability, what it involves, its general applicability, and its potential benefits (§3.2). We then (§4) set out a systematic framework for ML-driven ADM to assist in implementing technical and organisational record-keeping and logging mechanisms across the key stages of that process (§4.1-4.4). Finally (§5), we discuss some challenges for implementation and future directions for research. In all, we argue, reviewability can provide a practical, systematic, legally-grounded approach to producing useful transparency, through the recording of contextually appropriate information that supports meaningful accountability.

2. ACCOUNTABLE ALGORITHMIC SYSTEMS

We use automated and algorithmic decision-making (ADM) to mean decisions about natural or legal persons, their rights, interests, or entitlements made by other natural or legal persons using au-tomated processes. These automated processes can either directly produce a decision or produce information on which a human decision-maker subsequently bases their decision in whole or in part. They may also directly entail some subsequent action or pro-cess, perhaps connecting to other automated systems [66]. These outcomes may or may not have legal or similarly significant effects for the subject of the decision. Our use contrasts with the EU’s General Data Protection Regulation (‘GDPR’), which applies higher standards only to solely automated decision-making which does produces such effects [72].

Though discussions of ADM often focus on technical concerns, ADM in reality involves complex algorithmic systems - socio-technical "arrangements of people and code" [63]. Technical systems are only one part of a broader assemblage of human and technical elements. Many algorithmic systems for decision-making may involve machine learning (ML) models to produce decisions, produce informa-tion on which decisions are subsequently made, or manage other aspects that themselves make decisions. We focus on ADM where ML models produce either decisions or information upon which decisions are made.

Increased use of ADM in both public and private sectors has led to concerns about biases, errors, malfunctions, profiling, data pro-tection issues, and changing power relations, amongst others [54]. For instance, regulators investigated Apple Card after complaints that the algorithm was assigning higher credit limits to men than to women [73]. Amazon’s recruiting algorithm was reportedly shut down because it was discriminating against women [19]. In Australia, an applicant was denied access to information on how au-tomated rental subsidies were calculated, as the system was third-party and protected intellectual property [33].

As a result, calls have grown for ADM to better support various forms of investigation, assessment, and redress. However, much research on improving the accountability of ADM has focused on the models that drive decisions, rather than on broader technical and organisational processes of which models are only one part.

2.1 Accountability in ADM

Our understanding of accountability in ADM aligns with Bovens’s work on accountability [8], as interpreted for algorithmic contexts by Wieringa [75]. In Bovens’s model, accountability involves an actor from whom an account is given, a forum to whom it is given, and a relationship of accountability between them. Also considered is the nature of the account itself, and any consequences flowing from that account. For accountability to be meaningful, information given by the actor should support effective deliberation and discussion by the forum and the imposition of any consequences by the forum on the actor [75] (such as legal remedies or interventions to correct process malfunction).

Although accountability can be thought of abstractly in general terms, in practice it is highly contextual - different actors will likely be accountable for different aspects of the ADM process depending on what is to be accounted for, and the kinds, levels, and formats of information needed for a relevant and appropriate account will depend heavily on the forum to whom it is owed [8,75]. Bovens suggests that there are at least five kinds of accountability relationship, depending on the actor and the forum [8]: political accountability (to elected representatives and so on), legal accountability (to courts), administrative accountability (to auditors and regulators), professional accountability (to internal and external peers), and social accountability (to civil society and individuals).

Opacity in algorithmic systems. Achieving meaningful accountabil-ity of ADM is difficult. Commercial considerations and complex

Type of opacity	Cause
Intentional opacity Illiterate opacity Intrinsic opacity Unwitting opacity Strategic opacity Inadvertent opacity	Details of processes conceaied for commercial reasons [10, 18] Processes incomprehensible without technologicai literacy [10, 18] Incompatibility between machine and human reasoning [10, 18] Unawareness of relevance ofbroader processes for accountability [§2.1] Process information deliberately presented in an inaccessible way [68] Information unintentionally presented in an inaccessible way [68]

Table 1: Some types of opacity in algorithmic systems.

decision-making process involving ML often entail considerable opacity. Burrell identifies three forms of algorithmic opacity [10] ; while Burrell refers primarily to model opacity, these also relate to broader ADM processes. That is, the details of proprietary datasets, models, systems, and processes can be deliberately concealed to protect commercial interests (what Danaher calls ‘intentional opacity’ [18]). Details of ADM processes may be incomprehensible with-out relevant technical knowledge (‘illiterate opacity’ [18]). And the mismatch between the complex, mathematical nature of ML and human forms of reasoning makes models themselves difficult for even the technically literate to understand (‘intrinsic opacity’ [18]). Multiple forms of opacity can combine - it may be that data, models, systems, and processes are concealed for intellectual property reasons (e.g.[33]), and that, even if they were not, the models would be incomprehensible. Table 1 presents some of these types of opacity.

ML systems have sometimes been described as a ‘black box’, which might be a choice made to deliberately obscure [28] (a form of intentional opacity). However, understanding ADM as a socio-technical process, we argue that there is another form of opacity - unwitting opacity, where those responsible for designing, developing, deploying, and using systems simply don’t think to record relevant organisational aspects of ADM processes (perhaps unaware of their relevance for meaningful accountability [12]). By implementing technical and organisational record-keeping and logging mechanisms that allow processes to be holistically interrogated, intentional and unwitting opacity could be substantially addressed.

However, there are also risks in providing too much information. As well as the forms of opacity discussed, Stohl et al identify two further forms stemming from the ‘transparency paradox’ [68], where too much visibility actually reduces transparency. That is, opacity can result from providing too much or the wrong kind of information, or information presented inaccessibly, whether deliberately to obscure (‘strategic opacity’) or because the forum’s needs haven’t been considered (‘inadvertent opacity’) [68]. To avoid these forms of opacity, accountable ADM should provide not just any information about the technical and organisational elements of the process, but the right kind of information about aspects of the process that are relevant to the possible accountability relationships, presented in the appropriate way for the likely forums.

2.2 Limitations of model-focused mechanisms

However, considerable technical and legal research has focused on mechanisms to address forms of illiterate and intrinsic opacity by making models more transparent or understandable in some way, rather than looking more holistically across ADM processes. In focusing on model-centric mechanisms, forms of opacity relating to other aspects of the process, potentially more relevant for accountability, have been underdiscussed. This emphasis on models may itself even contribute to unwitting opacity by obscuring the need for mechanisms to address other aspects of those processes.

Proposals typically focus either on assisting those responsible for models to understand their functioning or on providing information to support other oversight mechanisms. This includes proposals for transparency of algorithms themselves, making code, model benchmarks, or other aspects of the model lifecycle available for scrutiny [10, 23, 45, 50, 56, 57]. Some have argued for equipping the general public with skills and knowledge to understand how models work [10], or that journalists could reverse engineer algorithms to inform the public about their workings [21, 22].

An increasing focus of research in recent years has been human interpretable explanations of the workings of ML models. Some proposals have seemingly been prompted by academic debate about the existence, nature, extent, and utility of a so-called ‘right to an explanation’ [26, 31, 44, 64, 74] in GDPR, which was passed in 2016. Surveys of the research landscape show a vast number of mechanisms for developing ‘interpretable’ or ‘explainable’ ML systems have been proposed, seeking to provide explanations and other interpretable accounts of model behaviour to a range of forums, many of which have come in the last few years [4,32].

While such proposals and mechanisms have their place, many proposals—whether intended to inform the general public, system developers, regulators, or others—address models themselves, rather than broader decision-making processes of which models are but one element. Though explanations can assist some specific concerns, like model engineering [7], explanations or other approaches focused on how the model itself works or has reached a particular outcome may miss much of what is important [75].

2.2.1 Meaningful accountability needs more than models. As ADM is a socio-technical process, with both human and technical elements, ‘accountable ADM’ should not just involve making models themselves explainable or transparent in some way. Models are only one aspect of ADM and cannot by themselves provide accounts of the process as a whole [66], whether to technical, legal, or other forums. In practice, other aspects of model development and related technical elements require consideration [24,56,66]. Moreover, technical elements of the process cannot provide suffi-cient information about its human aspects. Though explanations can give some information about model functioning, problems with a process or a decision often originate outside of the model itself: in the purposes for which it is used, improper assumptions in design and use, and other human and organisational factors around the model.

Moreover, particularly with explanations, model-focused approaches often unduly burden subjects of decisions with understanding and challenging them. Equipping people with skills to understand the models they encounter in their lives (as proposed by some) may seem attractive to technologists and those in technology policy, but much of the public are unlikely to have quite as much interest in ML’s inner-workings. Rather, people are concerned with broader aspects of ADM: the purposes, roles, and outcomes of decision-making processes as a whole, as it is the whole process that affects them rather than simply certain technical aspects of it [26]. More generally, it is not entirely clear why the public should have to understand and evaluate the models that make decisions about them, particularly as they will often have little real choice but to subject themselves to those decisions. Moreover, by focusing on models and holding individuals themselves responsible for understanding how they work, broader technical and organisational processes and other systemic issues are obscured, and attention is diverted from other (perhaps more relevant) factors [2,26]

While model-focused mechanisms are an important area of study, it is therefore important also to consider the broader processes of which models are only one part [66,75]. As Ananny and Crawford argue, “rather than privileging a type of accountability that needs to look inside systems,... we [should] instead hold systems accountable by looking across them” [2], recognising that these are socio-technical processes with both technical and human elements. Accountability also must consider algorithmic systems as they are situated - not just technical features, but how they affect different people in their contexts of use [36].

2.3 Accountable ADM as a process

Meaningful accountability thus requires a view of the whole socio-technical process [2,43,66,75], from commissioning of the system; through design of the model, selection of training data, and training and testing procedures; to making individual decisions; and on to the effects of those decisions and any subsequent investigations. For Ananny and Crawford, as for us, accountability of these processes “requires not just seeing inside any one component of an assemblage but understanding how it works as a system” [2]

Understanding ADM as a process allows us to identify points across that process for review of relevant human and technical factors by the appropriate forum, and, if necessary, to intervene to correct, mitigate, or otherwise address (potential) problems. This depends on targeted record-keeping and logging mechanisms to provide useful transparency by capturing technical and human (organisational) elements across the whole process - not necessarily for those subject to decisions, but to facilitate understanding of the functioning of the algorithmic system as a whole and to enable meaningful accounts to and oversight more generally by designers, developers, deployers, and overseers.

That accountable ML/ADM needs a view beyond technical to organisational elements has seen growing acceptance in recent years (e.g.,[2,17,20,34,39,52,53,56,60,75]. However, initiatives aimed at working towards this often lack systematic understandings either of ML as a broad process or of how to produce useful transparency to support meaningful accountability across that whole process. Instead, proposed approaches have often been piecemeal - identifying particular aspects of that process thought to be in some way problematic. While model-focused approaches support too narrow a form of accountability, piecemeal transparency of broader decision-making processes may not bring as much benefit as is hoped. Without a systematic understanding of the ADM process and how to achieve useful transparency across it, transparency mechanisms for specific aspects of that process won’t necessarily give the holistic view required, so might not provide information relevant to a particular account. Moreover, incomplete records showing a problem could mislead, suggesting that issues lie in the wrong place.

Conversely, while documentation is important [56], there are also risks in mechanisms that provide too much information by essentially recording and disclosing everything indiscriminately. Privacy and commercial or state surveillance concerns are raised by recording vast amounts of information on decision-making processes. And, again, providing too much information, or providing it in the wrong way, risks producing its own kinds of strategic or inadvertent opacity (the ‘transparency paradox’ [68]).

Moreover, as previously discussed, accountability is in practice highly contextual. Which kind of accountability relationship (political, legal, administrative, professional, or social) is relevant will influence the information the actor should provide. Information useful for regulators overseeing compliance with financial services regulations might differ from what someone overseeing their own systems needs to assess compliance with their internal policies, for instance. What subjects of decisions might find useful may differ from what courts need to assign liability for harm caused by decision-making processes. And how information is presented affects how well it can support meaningful accountability, and will again vary depending on the nature of the account and to whom it is owed [68]. Crucially, a meaningful account cannot be given unless the forum can understand and critically engage with the subject matter [37,75].

Supporting accountability through contextually appropriate information. Following the above, we argue that ADM processes that support meaningful accountability require mechanisms for recording and providing contextually appropriate information so as to provide for useful transparency. That is, accountable ADM is a process that requires transparency mechanisms targeted at particular aspects of that process to provide information which is:

relevant to the accountability relationships involved (i.e. from and to whom and in what form is an account likely to be owed);

accurate in that it is correct, complete, and representative;

proportionate to the level of transparency required (i.e. what granularity of information and degree of knowledge is likely to be needed about the operation of the process); and
comprehensible by those to whom an account is likely to be owed (i.e. how to present information so as to be understandable). A way of understanding transparency contextually and holistically across the ADM process so as to facilitate different forms of accountability as appropriate is therefore necessary. To achieve that, a more systematic approach to providing useful transparency that facilitates meaningful accountability is needed, as we now describe.

3. REVIEWABILITY

To support meaningful accountability, we argue that ADM pro-cesses should be designed and developed to be reviewable. Reviewability as a general concept involves technical and organisational record-keeping and logging mechanisms that expose the contextually appropriate information needed to assess algorithmic systems, their context, and their outputs for legal compliance, whether they are functioning within expected or desired parameters, or for any other form of assessment relevant to various accountability relationships.

In the context of ADM, reviewability seeks to provide a holistic view of the technical and organisational elements involved in producing an automated decision, considering factors both at design time and at runtime. The commission, design, deployment, and use of ADM processes, as well as the consequences of use and auditing and investigation of those processes are all within scope of reviewability (see Fig.1). This approach is derived from English administrative law, which is concerned primarily with holding human decision-making in the public sector to account.

Figure 1: Conceptual view of the reviewability framework.
Contextually appropriate technical and organisational information should be captured in all stages. Commissioning refers to all that causes the overall system to be brought into existence - e.g. its nature and scope, legal basis, compliance assessment, procurement. Development concerns details regarding system construction, encompassing the design and development of the technology and the relevant business processes. Operation concerns details of use, including its application to particular scenarios (inputs), as well as deployment specifics, and details of system behaviour, business workflows, etc. Investigation broadly concerns the that supporting investigation of the overall process(es), including evaluation metrics, interventions, remediations, etc.

3.1 Administrative law and ADM

Administrative law has developed to contend with the opacity and complexity of human decision-making, maintaining standards for even the most consequential decisions of life and death. With a few exceptions (around discrimination, for instance), administrative law as a framework and judicial review as a form oversight are generally not concerned with the merits of decisions themselves. Rather, they are concerned with the nature, quality, and legality of the decision-making process, either in a general sense or as it relates to a particular decision. That is to say, courts are generally not concerned with whether a particular outcome is right, but whether the process that produced that outcome was correct. In administrative law, there is no general duty to give reasons (or explanations), but decision-makers must act in line with long-established principles of good administration throughout that process.

In administrative law, various aspects of the decision-making process are considered both discretely and together, allowing principles to apply to those aspects to—in theory—ensure good decision-making. For instance, nominated decision-makers cannot delegate decisions entirely to someone else, though can take advice into account. Decision-makers must consider all information relevant to a decision and cannot consider any irrelevant information; nor can they consider relevant information that is factually inaccurate. Decision-makers cannot give even the appearance of bias in making a decision. Decision-makers cannot unlawfully discriminate on a protected characteristic. Although not giving reasons for consequential decisions can sometimes be unlawful, reasons given— unless inadequate—are usually not themselves the basis for finding that a decision was made unlawfully (though they can give insight into how a decision-maker approached a decision). Judicial review instead assesses various aspects of the process for compliance with administrative law’s principles.

Our previous work considered how administrative law and judicial review can apply to public sector ADM [13] In doing so, we drew on administrative law’s view of human decision-making as a process beginning before the decision and with consequences that resonate afterwards to show how these principles can apply to different aspects of ADM. Applying this understanding of decision-making as a process to ADM and identifying points at which the law’s principles are relevant potentially allows courts to review automated decisions made by public bodies without, for the most part, requiring explanations of the workings of the model itself.

For example, administrative law’s requirement that decision-makers consider all relevant information but no irrelevant information can apply to both model training and to making decisions using the model [13] This is particularly problematic for ADM given, the potential for proxies in ML [58], which can result in decisions based on factors that are not themselves directly relevant to the decision, and without considering factors that are in fact relevant [13]. To assess compliance, reviewers do not need to understand the workings of any technical components - they would instead consider the selection of factors for the model by its designer, the selection of training data, the data inputted for a decision, potentially inferences drawn by the model, and outputs produced by the model where they are subsequently relied upon by a human in making a decision.

To reiterate, in this paper we do not seek to apply administrative law standards to ADM in other sectors. Rather, we draw on and ex-pand upon our previous work —and administrative law’s approach to accountability of decision-making — to develop a framework for accountable ADM, applicable to accountability relationships of all kinds, in the public sector and elsewhere. This reviewability framework supports systematic and holistic assessment and review of ADM processes for compliance with any technical, legal, ethical, and policy standards and requirements.

3.2 Reviewable ADM

Reviewability of ADM is concerned with exposing the whole decision-making process, potentially including: specifications and evaluations by those commissioning systems, decisions by engineers in developing systems; data used to train and test systems; training and testing procedures themselves; inferences drawn by the system while making automated decisions; and the fairness, efffects, and lawfulness of those decisions in practice (see §4). As such, reviewable ADM processes are those that systematically implement technical and organisational record-keeping and logging mechanisms at all stages of commissioning, development, operation, and investigation to allow holistic review of the algorithmic system, its context, and outcomes. While ‘reviewability’ as a high-level concept has applications in various areas [47], and is relevant for algorithmic systems in general, it therefore takes an approach to transparency and accountability of ADM that goes beyond explanations or other mechanisms more narrowly focused on technical components.

Drawing from administrative law’s approach to identifying factors to assess at points in the human decision-making process, reviewability’s systematic view of ADM focuses on points of review and intervention. As §4 elaborates, these exist where people designing, developing, deploying, and using ADM take some action or decision relating to the process, or where technical components process data in some way. At these points, contextually appropriate information can be recorded about actions, decisions, or processing undertaken. This offers useful transparency of ADM processes both at design-time and—crucially—at run-time, providing information not just on how the algorithmic system was designed, developed, or intended to operate, but also on how it functions and what kinds of decisions its produces in practice. By providing information at key points of the ADM process, reviewability assists in assessing individual decisions and in determining whether the algorithmic system as a whole is functioning as intended or required.

This view across the whole process is important, given that accountability relationships and thus what it means to be accountable will differ depending on from and to whom an account is owed. Various actors at different stages across the same process may need to account to multiple forums - developers, regulators, courts, subjects of decisions, and so on. Implementing reviewability—with a systematic evaluation of what is contextually appropriate for various aspects of the decision-making process—helps ensure that relevant, accurate, proportionate, and comprehensible information is available to provide an account. By systematically implementing targeted technical and organisational record-keeping and logging mechanisms to enable the provision of contextually appropriate information about the process as a whole, reviewability thus supports meaningful accountability relationships between the multiple actors involved in the ADM process and various relevant forums.

4 A FRAMEWORK FOR REVIEWABLE ADM

Reviewability offers a systematic approach to useful transparency by breaking down the ADM process into stages—from conception of the system through to consequences and scrutiny—each consisting of a number of steps (see Table 2). These steps and stages can be considered discretely and together, underpinning a framework for developing and assessing reviewable ADM processes. At each stage there are opportunities to (i) place limits on ADM and define (un)desirable behaviour or functionality, (ii) implement contextually appropriate transparency mechanisms, (iii) review compliance with those limits and general functioning, and (iv) revise the process or take other action as required.

Reviewability thus supports a non-linear, iterative, and cyclical process of review, feedback, and revision, in line with the understanding of accountability discussed above and with our view of ADM as a process involving human and technical elements. Practitioners may move between steps non-linearly, depending on their role and the situation, as systems are developed, deployed, used, and revised. Not all steps will occur with each algorithmic system, and some will be more relevant than others, but at a high level these steps and stages will be common to many ADM processes.

Others have also discussed ML as a process involving multiple steps. Lehr and Ohm, for instance, propose two workflows with eight steps for helping the law understand ML: “playing with the data” (involving problem definition, data collection, data cleaning, summary statistics review, data partitioning, model selection, model training) and “running model” (involving model deployment) [43]. Wieringa applies the Software Development Life Cycle model to divide the process into ‘ex ante’, ‘in media res’, and ‘ex post’ stages [75]. Suresh and Guttang split the model development lifecycle into six phases: data collection, preparation, model development, evaluation, post-processing, and deployment [69]. The Partnership on AI also considers ML in stages, from “system design and setup” through “maintenance” and “feedback” once operational [56]. Guidance from the UK Information Commissioner’s Office and the Alan Turing Institute group propose several tasks for producing explanations that relate to various aspects of the process. [53]

Stage	Step
Commissioning	Procurement Problem definition Impact assessment
Model building	Data collection Pre-processing Model training Model testing
Decision-making	Deployment Use Consequences
Investigation	Audit Disclosure

Table 2: Reviewability overview of a model-driven ADM process. Each step entails record-keeping considerations.

These understandings of ML as a multi-step process are useful, but incomplete. First, while they acknowledge ML as socio-technical, they primarily focus on model-related issues, thereby not fully accounting for human and organisational aspects. Second, they miss some important aspects of ML processes prior to data collection and following deployment - procurement, impact assessments, consequences of decisions, and audit and assessment of that process (whether by those responsible or by an external overseer). Our framework, grounded in administrative law’s approach to accountable human decision-making, provides a more holistic view by adding two stages: ‘commissioning’ (involving procurement, problem definition, and impact assessment) and ‘investigation’ (involving audit and disclosure), thereby encompassing more of the human decisions that are crucial to understanding the system.

Reviewability in practice: a systematic approach. These stages and steps allow those responsible for ADM processes to consider accountability systematically. First, they should assess from which actor and to which forum accounts are likely to be owed at each. Forums, highly contextual, should be considered on a case-by-case basis when moving through the framework. Accountability of actors is likely to be dispersed, with obligations on multiple actors to explain or justify actions or decisions relating to aspects of the process for which they are responsible [75]. For identifying relevant actors, Wieringa’s work on applying Bovens’s model to algorithmic accountability can assist. In particular, Wieringa proposes three generic roles for actors in algorithmic systems [75] - decision-makers (those responsible for deciding to use an algorithmic system and defining its specifications and other fundamental features), developers (those responsible for specifics of developing the technical components to the required specification), and users (those who use the system to produce a decision). We use managers instead of decision-makers, to avoid confusion with actors in the ‘decision-making’ stage using the system to make decisions.

Having considered which actors and forums are relevant at each stage and step, those responsible for ADM processes should then consider which technical and organisational record-keeping and logging mechanisms could provide contextually appropriate information for those accountability relationships. We emphasise again that reviewability does not simply mean indiscriminate record-keeping at each step. Instead it is about targeted, useful trans-parency, providing information that is (i) relevant to the account-ability relationships involved, (ii) accurate, complete, and repre-sentative, (iii) proportionate to the level of transparency likely to be required, and (iv) comprehensible by the relevant forums (§2.3).

By considering each stage systematically in this way, the ADM process as a whole can be made reviewable, facilitating meaningful accounts of its operation to those to whom they may be owed.

We now discuss what each stage involves, indicate which actors are likely to be relevant at each stage and what kind of transparency mechanisms may be available or relevant at each step. Note we do not prescribe how to build reviewable systems, or to propose new technical or organisational solutions. Instead, we present a framework for systematically considering the information necessary for providing useful transparency, and thereby enabling meaningful accountability, in the context of a model-driven ADM process. We also indicate some factors that might warrant consideration.

4.1 Commissioning

Commissioning involves anything relevant to bringing the algorithmic system (and its constituent parts) into existence. At this stage, managers are the actors likely to be particularly relevant.

4.1.1 Problem definition. ADM exists for a purpose. Records relating to the aims of and rationale for the algorithmic system, giving insight into the values and norms behind its commissioning, development, and operation, are therefore relevant to various accountability relationships. Documentation and other records of the system’s aim, scope, and justification—what it will do, why it is required, and the role it will play—are worth considering. Various impact assessment and procurement guidance documents reflect the need for clear specifications of these aspects (see below). Also likely relevant is information regarding the decision-making processes or systems that ADM will subsume or replace. Information from business analysis or requirements engineering activities, common for many organisational technology undertakings [42], will often be pertinent, as will any documents such as minutes from board meetings, consultancy reports, and so on that involve discussions or decisions about the nature of the proposed system.

4.1.2 Impact assessment. These involve assessing the potential implications and risks of an ADM system, and are important mechanisms for uncovering and mitigating potential problems. Assessments may encompass a range of concerns, such as legality and compliance [72], issues of discrimination and equality [49,59], impacts on fundamental rights [35], ethical issues [16], sustainability concerns [25], amongst others. Some assessments are legally required - GDPR, for instance, requires Data Protection Impact Assessments (DPIAs) in various circumstances [72], including where there is an extensive evaluation of personal aspects that leads to decisions with significant effects (common for many ADM processes); other existing (e.g. [48]) and proposed (e.g. [17]) regulatory regimes also require assessments. Even where assessments are not legally required, they are good practice, and robust internal (in-organisation) assessment processes have value [60]. There are many materials on various topics that can assist assessment undertakings [34,35,61].

To facilitate accountability, records should be kept of any such assessment. These might include details of the actual assessment, and also the outcomes and any mitigation measures employed as a result. Other relevant information can include whether assessments were legally required and whether they led to an interaction with an oversight body (such as a regulator), as well as information on who conducted the assessment (an internal process or external organisation) and whether any advice was sought. Plans for ongoing monitoring and reassessment of the process are also likely to be relevant to various accountability relationships.

4.1.3 Procurement. In practice, ADM will often involve some kind of procurement - whether to obtain models or other technical components core to decision-making, data for training, technical components to support development or deployment, service arrangements for outsourcing business workflows, external consultancies for risk assessments, and so on. The nature of any procured product or service can influence the overall algorithmic system.

The importance of a robust procurement processes for algorithmic systems is well-recognised, with a range of guidance regarding this (e.g. [71,77]). Records relating to procurement will likely serve various accountability relationships. This will often include details of contractual arrangements, tender documents, design specifications, quality assurance measures, and so on. Details of suppliers, including any due-diligence performed, may also be relevant. Any salient characteristics of what is being procured (e.g. test or acceptance criteria) are often best defined by the managers or documented as part of arrangements with the supplier. There should also be suitable mechanisms in place for suppliers to be audited or compelled to provide information as required (cf. [33]).

Where procurement entails service engagements—e.g. outsourcing of business processes, or engaging cloud or other infrastructure services—details of the arrangement and terms of service will likely be important, as will levels-of-service guarantees, agreements for audit and inspection, etc. In future, third-party providers may adopt approaches such as service ‘factsheets’ [3] that describe aspects of their service beyond technical capabilities. Where procurement relates to technical ‘products’, such as libraries, toolkits, test-kits, and software packages, details of the version, documentation, developing organisation, terms of use, availability, usage parameters or constraints, and licensing arrangements, may also aid review.

4.2 Model building

This stage involves the model development process, including related system design aspects; in particular, the human decisions in the model lifecycle. Any ADM process driven by models is in scope; if a system comprises several models, information on each model will be required, with further consideration of how the models interact with one another in practice systemically as part of deployment. Much past work has focused on models and technical elements (§2.2), and emerging regulatory frameworks explicitly reference record-keeping during model development [17]. However, they often overlook the broader human dimensions of this stage.

In this stage prior to deployment, accountability of different actors is especially crucial. While developers may be actively making decisions in the model build process, the implications of these decisions should be made clear, with appropriate oversight and documented approvals by managers.

Fig.2 demonstrates a typical ML model development lifecycle: 1) data collection, 2) pre-processing, 3) training, and 4) testing. At each lifecycle step, contextually appropriate records should be kept of qualitative and quantitative assessments of potential risk factors.

4.2.1 Data collection. Before any model training, data must first be collected. This involves selecting a population from the world-at-large and relevant features with which to form a dataset. This can involve manual input of data, or surveys, or an automated process;

Figure 2: Model build lifecycle: review process

such as through webpage scraping, or acquiring datasets. Often those undertaking data collection are separate from those responsible for the remaining model lifecycle; e.g. the collectors might be from a different institution or organisational unit. Information on the decisions made as part of data collection is important for understanding the potential risks, limitations, and implications.

Details of the provenance of data, regarding its lineage and decisions made throughout its lifecycle—creation, collection, collation, processing, and sharing—is particularly important (both here and in the subsequent step) [66]. Gebru et al. propose standardised documentation processes for datasets [30], especially useful for facilitating communication across those working with datasets. The datasheet contains questions for the data collector regarding: 1) motivation, 2) composition, 3) collection process, 4) pre-processing/ cleaning/labelling, 5) uses, 6) distribution, and 7) maintenance. Questions 4-7 may not be fully answered by data collectors; while some processing may be done in collection (e.g. extracting text from a web site by removing markup), subsequent aspects would often be performed by the model developer with the specific use case in mind. Similarly, a “data statement” records relevant characteristics specific to text datasets for natural language processing [6].

The implications of how data is collected, how datasets are constructed, and how they relate to the system’s potential risks are important and should be carefully considered. Additional questions may be needed to turn this information into actionable insights or to inform model design decisions. Contextually appropriate information relating to such assessments and questions will often need recording. Importantly, other information beyond that of the data itself—e.g. relating to other human decisions in data collection or to alternative options that were dismissed—are often also relevant.

4.2.2 Pre-processing. Pre-processing includes data cleansing (e.g. outlier detection, handling missing data or data inconsistencies), data wrangling/transformation, data merging, feature engineering/construction, data labelling, and feature selection [24,40]. Human decisions here could affect the model outcome; for instance, how missing values are imputed, changes to data structure or format, and selection of input features. Decisions in choosing and measuring features and labels can contribute to unintended measurement bias where they involve imperfect proxies for desired quantities [69]. For instance, using grade point average (GPA) to estimate student success is a decision that simplifies the latter with a proxy. As such, useful records would often relate to these various aspects of preprocessing, including (but not limited to) around which proxies were selected and why. Datasheets include relevant questions to facilitate record-keeping for features in the dataset [30]; answers to questions about these aspects of model building should also consider the potential system-level implications of decisions. For example, the gap between the “observed” feature space and the “decision” space results in a mismeasurement of the target feature, especially in the presence of historical and structural discrimination; GPA, for instance, is an imperfect measure of success in high school [29].

4.2.3 Training. In preparation for model training, datasets are often split into training data, testing data, and (sometimes) validation data. In practice, some technical aspects of training and testing steps would be explicitly linked and iteratively performed, but there are aspects about the model training process that should be explicitly considered. Relevant information about the selection of training data (e.g. around ensuring that training data is representative of the dataset as a whole) and about the training processes should be recorded, as appropriate. Details about the workflow of model construction is often also important [66]; this includes the machine learning approach(es) tried, tested (and perhaps, discarded), relevant aspects of tuning: hyperparameters, predictor variables, and coefficients (variable weights), and any other factors [56, 62]. For example, model type and any parameters used in training, such as pruning methods in decision trees or choice of regularisation coefficient in lasso regressions, can be important to record, along with the results of the testing phase. These bear consideration as they provide the means not only to review the model and its process of creation, but also to enable some degree of model ‘reconstruction’.

4.2.4 Testing. Once a model is trained, it is tested to calculate relevant metrics. This may be iterative, as representation bias of testing data may be assessed against the population and the model may be retrained and re-tested to achieve the target metrics. Building on datasheets, ‘model cards for model reporting’ offer standardised documentation procedures to communicate the performance characteristics of trained ML models [45]. The model card includes illustrative examples of questions in 9 categories: 1) Model Details, 2) Intended Use, 3) Factors, e.g. demographic or phenotypic groups, environmental conditions, technical attributes, 4) Metrics, 5) Evaluation Data, 6) Training Data, 7) Quantitative Analyses, e.g. of fairness, 8) Ethical Considerations, and 9) Caveats and Recommendations.

Model cards include some details on evaluation data and metrics [45], including performance measures, but models should be tested not only on accuracy but also safety, fairness, explainability, and security. Information relating to testing for these aspects is highly likely to be relevant to various accountability relationships. Any testing performed should be presented in the form of “verifiable claims” with audit trails and explanations of model predictions [9]. ‘FactSheets’ have been proposed for ML models offered as a service, recording testing results to give confidence to the service users on the model [3]. Similar record-keeping mechanisms may be useful for model testing in other contexts.

Selection of evaluation (or “success”) metrics is relevant. Often these are subjective and value-laden. In a well-known example, a US criminal recidivism model was accused of being racially biased [27], as, of defendants who ultimately did not reoffend, black people were more than twice as likely as white people to be classified as medium or high risk. The model’s developers argued it was fair as, among defendants with the same risk score, the percentage of white and black defendants reoffended was broadly the same. Importantly, these two definitions of “fairness” are mathematically incompatible - in practice is it impossible to satisfy both simultaneously [27]. This illustrates the importance of recording how objectives, such as fairness, are defined, quantified and operationalised, so that they can be scrutinised and debated.

4.3 Decision-making

This stage concerns all operational aspects of ADM. This includes details of how the system is deployed and supported, and, importantly, all aspects leading up to a particular decision being made, and its consequences [66]. Here, managers and users are most likely to be relevant, although developers may also be involved, particularly for the deployment step.

4.3.1 Deployment. This concerns aspects of the process supporting the operation of the overall ADM system and its constituent parts. Information regarding deployment can help with understanding and verifying that appropriate mechanisms and procedures for supporting and maintaining the system are in place. Note that this step involves both a design and operational dimension; that is, building and testing the processes for deployment as well as supporting the ADM process ‘in production’.

From an organisational perspective, relevant information here relates to workflows and business processes relating to the algorithmic system. This encompasses, for instance, operating procedures, manuals, details of staff training and procedures relating to actually making decisions (using the system), as well as to operational support (maintaining the system). Also potentially relevant are records regarding provisioning and support of technical components, such as details of the data and system pipelines [66], including model integration(s); storage, compute, and networking; scalability and security plans; logging mechanisms, technical audit procedures, etc.

The management of the records and other data supporting useful transparency is another aspect bearing consideration. Details of the operating procedures, access management and integrity controls, encryption, and any other regimes in place to ensure that records and logs are appropriately managed may be relevant.

4.3.2 Use. This step concerns using the ADM system to actually make decisions (whether the system ‘decides’, or where a human makes the final decision). As such, it involves information on all aspects relevant to decisions actually being arrived at.

Records of a model’s inputs and outputs are generally important, as are details of what and how information and feedback is presented to users. Parameters and metadata associated with a model’s use also warrant consideration, as do operational records at a technical (systems log) level. Mechanisms for model interpretability or explanation or that otherwise describe how a model operates (see §2.2) may also be relevant here.

However, this step also involves more than specifics of the model’s part in the decision. Where decision-making involves human users of the system, or is otherwise manual, proper documentation arrangements are often required to record what occurs. For instance, many ADM systems will entail manual data entry, or will have a ‘human in the loop’ working with the system to produce decisions - information regarding this will often be pertinent. For technical processes, logging mechanisms can capture the details of inputs, outputs, and data processing/computation [66]. Generally, ‘metadata’—including relevant time stamps, system or process versioning, records of any exceptional occurrences and operations, and so on-is also useful, often revealing potential issues.

Note that for effective reviewability, it is essential that this step involves recording what actually occurs. This is because it will generally be insufficient to rely on what is supposed to happen -i.e. according to the the pre-defined workflows, business practices, and systems specification/documentation - given the propensity for these to differ from what actually occurs day-to-day.

4.3.3 Consequences. Once a decision is made, it is generally important to have information of the subsequent and follow-on steps, such as quality assurance processes (and their outcomes) for a decision, as well mechanisms that review sets of historical decisions. This might include checking for model skew, or any inappropriate discrimination or other behaviour that may be manifesting [67]. Moreover, once a decision is made (and ‘finalised’), generally relevant are records and logs about any actions taken to give effect to the decision. This includes, for instance, details about how the decision is communicated, and the triggering of any new workflows (e.g. initiating the loan process, assigning a mortgage, and so forth).

Note that cascading consequences can flow from a decision. Records of these are therefore important for better understanding the broader impacts and flow-on effects of decisions (see [66]).

4.4 Investigation

The investigation stage includes any oversight or investigatory activity, either internal (e.g. by compliance teams) or external (e.g., regulators and oversight agencies, civil society groups, individuals, and so on). Managers are particularly relevant here, though developers and users may also play a role.

4.4.1 Audit. Auditing processes are important to ensure that the entire ADM process works as intended. Audits might be conducted in-house, to evaluate and check that decision-making processes are apt, that procedures for human elements of the process are suitable and have been followed, and that technical systems components are functioning correctly [34, 52]. Audits may also be external, whether by regulators checking compliance, investigators unpacking what led to a particular outcome, and so forth [34]. An entity may also seek to audit third-parties, such as organisations from whom they procure services, to ensure that arrangements and agreements are being met. Records of auditing activity facilitate scrutiny, and might include details of the audit, the basis or other reasons why it was undertaken, how it is conducted, and any findings. Where an audit leads to any subsequent actions or remedial response (e.g. system debugging, penalty actions), details of these should also be recorded. Given the potential sensitivity of audit data, records should be kept regarding how, when, why and by whom it was accessed.

4.4.2 Disclosure. Disclosures make information about the ADM process available to others, and are therefore a key aspect of meaningful accountability to external forums. Organisations should have processes for making relevant records and logs available when requested and as contextually appropriate for the accountability relationships involved. This will generally require the collation and aggregation of information recorded in previous stages, ensuring that it is presented appropriately to support critical understanding by the forum. Note that, in practice, many organisations appear ill-prepared for meeting their disclosure obligations [65,76].

Records about disclosures themselves can also be relevant - both of the processes for disclosure as well as what was actually released, how information was compiled, how it was delivered, in what format, to whom, and when. The basis for disclosure is also relevant -was it legally required, e.g. as a result of an data access right under the GDPR [72], an order by a regulator, part of a legal proceeding, in line with established best practices, or even simply as an organisational choice. And, again, means for obtaining information from (third-party) suppliers may also require consideration.

5 PRACTICALITIES AND FUTURE RESEARCH

When implementing reviewability, we emphasise the importance of considering which mechanisms might provide useful transparency given the likely accountability relationships; i.e. (i) the actors from and the forums to whom an account for each stage of the process will likely be owed, and (ii) what kind of information relating to the process might be contextually appropriate for those relationships. Simply recording everything is not only often impractical—with vast storage requirements [5,12] —but could also result in a form of inadvertent opacity, where those assessing ADM processes (whether internally or externally) have too much information to sift through to be able to use it effectively (i.e. the ‘transparency paradox’ [68]). Similarly, simply attempting to record all aspects without consideration may itself lead to transparency gaps, by virtue of the records appearing unrepresentative, unwittingly failing to capture particular aspects, and so forth.

Moreover, transparency can be harmful [2]. Recording information across a whole ADM process could bring substantial data protection and privacy risks, particularly if decisions concern people (especially where they are marginalised or vulnerable). Information recorded to facilitate review could potentially be personally revealing if wrongly disclosed and may be of interest to law enforcement agencies or surveillance programmes. To minimise the risks of harmful privacy breaches and surveillance, the relevance and proportionality aspects of assessing contextually appropriate information are crucial considerations.

Given these risks, good record management measures are essential. In practice, this means strong security regimes, potentially including technical measures (e.g. access controls, encryption, etc.) and organisational processes, to suitably manage and protect the information recorded. Details of the controls in place, as well as accesses and operations over this data, should also be recorded. There are practical challenges and opportunities for research in this space; including, for example, how to manage access to which data, in what circumstances, and for what purposes - particularly given the range of stakeholders involved and reviewability’s contextual nature. Similarly, given that this data relates to responsibilities, obligations, liabilities, etc, means for ensuring and verify the integrity and veracity of the data are important.

The presentation of information should also be considered. As previously discussed, too much information, or information presented without considering the forum’s requirements, can produce inadvertent opacity [68]. Large amounts of data may be involved and raw records may require transformation to a form facilitating meaningful accountability. Mechanisms that assist this while being sensitive to the forum’s needs are an area for further attention.

In this context, developing guidance, best practice, and standards for record-keeping and accountability more generally can assist. These can lead to more useful transparency regimes by raising the bar for organisational practices and increasing levels of consistency. There is scope for tools to assist organisations in identifying what is contextually appropriate, perhaps by indicating the needs of particular stakeholders, or particular sectorial considerations. There is also scope for specific guidance, e.g. around securing and managing records, appropriate record-keeping formats, standard forms of disclosure, and so forth. Technical toolkits may also play a role in facilitating the logging and information to be extracted from technical components in a common manner [12]. Given that record-keeping requirements in a tech-context are increasingly backed by regulation [17,72], there is a role for oversight bodies in helping define and shape what reviewability means ‘in practice’.

Finally, we emphasise that reviewability will not solve all problems with ADM. There are limits to transparency itself as a mechanism for supporting accountability [2, 26, 68]. Even reviewable systems may not capture broader business processes, the wider context of ADM processes, or the situations in which they are embedded. They may not capture interactions with other sociotechnical processes, which form part of wider, complex, interconnected systems (although other mechanisms, such as decision provenance [66] could assist here). Nor will they provide information about business models or questions of power that come with automation of decision-making processes. It’s important also to remember that transparency has temporal limitations (i.e. “things change over time”[2]), and even contextually appropriate information is always open to subjective interpretation and contestation [2]. Reviewability must be understood as both a living and iterative process.

More fundamentally, even useful transparency may not provide a definitive understanding of how processes function, and will only go so far without substantive accountability and control mechanisms. Nevertheless, reviewability works to close a gap, by potentially providing information on (at least) the assumptions, decisions, and priorities of those responsible for designing, deploying, and using ADM and their practical effects. As a mechanism for producing useful transparency across the ADM process, reviewability can, we argue, help support other (technical, legal, regulatory) accountability mechanisms that can make a significant difference.

6 CONCLUSION

Given ADM’s rising prevalence, and risks of potential harm, methods for facilitating review and oversight are needed. Reviewability offers a legally-grounded, holistic, systematic, and practical framework for making algorithmic systems meaningfully accountable. Through targeted technical and organisational record-keeping and logging mechanisms, reviewable ADM processes provide contextually appropriate information to support review and assessment both of individual decisions and of the process as a whole.

Reviewability cannot on its own address all potential problems with ADM; for that, a wider examination of its political economy and socio-economic context is also needed, as are legal protections for individuals and groups. However, reviewability provides a way to gain a better understanding of ADM processes, not only for those employing ADM, but also for oversight bodies and those affected by decisions. Reviewable ADM could therefore potentially be better assessed for legal compliance and for decision quality, and can support those addressing more structural factors.

Research is clearly needed on implementation. The specifics of what record-keeping and logging might be appropriate at each step of the process, of what kind of information would be useful to retain, and of how this information can best be presented to forums to facilitate effective review of the algorithmic system’s operation depends on the system in question, the domain, and its purpose. However, as we have shown, existing mechanisms can be integrated into the reviewability framework, and we have indicated directions for future work to fill gaps. Although there is work to do, reviewability provides a practical way of developing meaningfully accountable ADM processes that can be implemented now.

ACKNOWLEDGMENTS

The Compliant & Accountable Systems Group acknowledges the financial support of the UK Engineering & Physical Sciences Re-search Council (EP/P024394/1, EP/R033501/1), Aviva and Microsoft through the Microsoft Cloud Computing Research Centre.

[Source: Jennifer Cobbe, Michelle Seng Ah Lee, Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. In ACM Conference on Fairness, Accountability, and Transparency (FAccT'21), March 1–10, 2021, Virtual Event, Canada. ACM, New York, NY, USA. https://doi.org/10.1145/3442188.3445921

REFERENCES

[1] Philip Alston. 2019. Report of the Special Rapporteur on extreme poverty and human rights. United Nations General Assembly A/74/493 (2019). https://undocs. org/A/74/493 [Back]

[2] Mike Ananny and Kate Crawford. 2016. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society (2016). [Back]

[3] Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, A Mojsilovic, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, et al. 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6-1. [Back]

[4] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Ben-netot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Richard Benjamins, Raja Chatlia, and Francisco Herrera. 2020. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Information Fusion 58 (2020). [Back]

[5] Adam Bates, Dave ( Jing) Tian, Kevin R.B. Butler, and Thomas Moyer. 2015. Trustworthy Whole-System Provenance for the Linux Kernel. In 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C., 319-334. https://www.usenix.org/conference/usenixsecurity15/technical- sessions/presentation/bates [Back]

[6] Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587-604. [Back]

[7] Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yun-han Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, and Peter Eckersley. 2020. Explainable Machine Learning in Deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20) . Association for Computing Machinery, New York, NY, USA, 648?657. https://doi.org/10.1145/3351095.3375624 [Back]

[8] Mark Bovens. 2006. Analysing and Assessing Public Accountability. A Conceptual Framework. EUROGOV, European Governance Papers No. C-06-01 (2006). [Back]

[9] Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, et al. 2020. Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213 (2020). [Back]

[10] Jenna Burrell. 2016. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society 3 (2016). Issue 1. [Back]

[11] Danielle K Citron and Frank Pasquale. 2014. The Scored Society: Due Process for Automated Predictions. University of Maryland Francis King Carey School of Law Legal Studies Research Paper 8 (2014). [Back]

[12] Richard Cloete, Chris Norval, and Jatinder Singh. 2020. A Call for Auditable Virtual, Augmented and Mixed Reality. [Back]

[13] Jennifer Cobbe. 2019. Administrative Law and the Machines of Government: Judicial Review of Automated Public-Sector Decision-Making. Legal Studies (2019). [Back]

[14] Jennifer Cobbe, Michelle Seng Ah Lee, Heleen Janssen, and Jatinder Singh. 2020. Centring the Rule ofLaw in the Digital State. IEEE Computer (2020). [Back]

[15] Cary Coglianese and Daniel Lehr. 2017. Regulating by Robot: Administrative Decision Making in the Machine-Learning Era. Georgetown Law Journal 105 (2017). [Back]

[16] European Commission. 2020. Denmark AI Strategy Report. https://ec.europa. eu/knowledge4policy/ai-watch/denmark-ai-strategy-report_en#regulation [Back]

[17] European Commission. 2020. White Paper On Artificial Intelligence - A European Approach to Excellence and Trust. COM 65 (2020). [Back]

[18] John Danaher. 2016. Three Types of Algorithmic Opacity. Algocracy and the Transhumanist Project (5 March 2016). https://algocracy.wordpress.com/2016/ 03/05/three-types-of-algorithmic-opacity [Back]

[19] Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (Oct 2018). https://www.reuters.com/article/us- amazon- com- jobs- automation- insight/amazon- scraps- secret- ai- recruiting- tool-that-showed-bias-against-women-idUSKCN1MK08G [Back]

[20] Demos, doteveryone, Global Partners Digital, Institute for Strategic Dialogue. 2020. Algorithm Inspection and Regulatory Access. Joint Paper (2020). https: //demos.co.uk/blog/algorithm-inspection-and-regulatory-access [Back]

[21] Nicholas Diakopolous. [n.d.]. Algorithmic Accountability Reporting: On the Investigation of Black Boxes. Tow Center for Digital Journalism ([n. d.]). [Back]

[22] Nicholas Diakopolous. 2015. Algorithmic Accountability: Journalistic Investi-gation of Computational Power Structures. Digital Journalism 3 (2015). Issue 3. [Back]

[23] Nicholas Diakopolous. 2016. Accountability in Algorithmic Decision Making: A view from computational journalism. Commun. ACM 59 (2016). Issue 2. [Back]

[24] Pedro Domingos. 2012. A Few Useful Things to Know about Machine Learning. Commun. ACM 55, 10 (Oct. 2012), 78?87. https://doi.org/10.1145/2347736. 2347755 [Back]

[25] L. Duboc, S. Betz, B. Penzenstadler, S. Akinli Kocak, R. Chitchyan, O. Leifler, J. Porras, N. Seyff, and C. C. Venters. 2019. Do we Really Know What we are Building? Raising Awareness of Potential Sustainability Effects of Software Systems in Requirements Engineering. In 2019 IEEE 27th International Requirements Engineering Conference (RE). 6-16. [Back]

[26] Lilian Edwards and Michael Veale. 2017. Slave to the Algorithm? Why a ’Right to an Explanation’ is Probably Not the Remedy You Are Looking For. Duke Law & Technology Review 17 (2017). [Back]

[27] Avi Feller, Emma Pierson, Sam Corbett-Davies, and Sharad Goel. 2016. A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. The Washington Post (2016). [Back]

[28] Agata Foryciarz, Daniel Leufer, and Katarzyna Szymielewicz. 2020. Black-Boxed Politics: Opacity is a Choice in AI Systems. Panoptykon Foundation (17 January 2020). https://en.panoptykon.org/articles/black-boxed-politics-opacity-choice- ai-systems [Back]

[29] Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im)possibility of fairness. arXiv preprint arXiv:1609.07236 (2016). [Back]

[30] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010 (2018). [Back]

[31] Bryce Goodman and Seth Flaxman. 2016. European Union regulations on algo-rithmic decision-making and a ’right to an explanation’. 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016) (2016). [Back]

[32] Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey of methods for Explaining Black Box Models. Comput. Surveys 51 (2018). Issue 5. [Back]

[33] Information and Privacy Commission New South Wales. 2020. Case Summary on Automated decision making and access to information under the GIPA Act. https://www.ipc.nsw.gov.au/case-summary-automated-decision-making- and-access-information-under-gipa-act [Back]

[34] Ada Lovelaec Institute. 2020. Examining the Black Box: Tools for Assessing Algorithmic Systems. (2020). [Back]

[35] Heleen L Janssen. 2020. An approach for a fundamental rights impact assessment to automated decision-making. International Data Privacy Law 10, 1 (03 2020), 76-106. https://doi.org/10.1093/idpl/ipz028 arXiv :https://academic.oup.com/idpl/article-pdf/10/1/76/33151837/ipz028.pdf [Back]

[36] Michael Katell, Meg Young, Dharma Dailey, Bernease Herman, Vivian Guetler, Aaron Tam, Corinne Binz, Daiella Raz, and P M Krafft. 2020. Towards Situated In-terventions for Algorithmic Equity: Lessons from the Field. Fat* ’20: Proceedings of the 2020 Conference on Fairness, Accountabiltiy and Transparency (2020). [Back]

[37] Jakko Kemper and Daan Kolkman. 2018. Transparent to whom? No algorithmic accountability without a critical audience. Information, Communication & Society 22 (2018). Issue 14. [Back]

[38] Amir E Khandani, Adlar J Kim, and Andrew W Lo. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance 34, 11 (2010), 2767-2787. [Back]

[39] Ansgar Koene, Chris Clifton, Yohko Hatada, Helena Webb, Menisha Patel, Machado Caio, Jack LaViolette, Rashida Richardson, and Dillon Reisman. 2019. A governance framework for algorithmic accountability and transparency. [Back]

[40] SB Kotsiantis, Dimitris Kanellopoulos, and PE Pintelas. 2006. Data preprocessing for supervised leaning. International Journal of Computer Science 1, 2 (2006), 111-117. [Back]
[41] Maciej Kuziemski and Gianluca Misuraca. 2020. AI governance in the public sector: Three tales from the frontiers of automated decision-making in democratic settings. Telecommunications Policy (2020), 101976. [Back]
[42] P.A. Laplante. 2017. Requirements Engineering for Software and Systems, Third Edition . Taylor & Francis. https://books.google.com.au/books?id= XfvnswEACAAJ [Back]
[43] David Lehr and Paul Ohm. 2017. Playing with the Data: What Legal Scholars Should Learn About Machine Learning. UC Davis Law Review 51 (2017). [Back]
[44] Gianclaudio Malgieri and Giovanni Comandé. 2017. Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation. International Data Privacy Law 7 (2017). Issue 4. [Back]
[45] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency . 220-229. [Back]
[46] Dierdre K Mulligan and Kenneth A Bamberger. 2019. Procurement as Policy: Administrative Process for Machine Learning. Berkeley Technology Law Journal 34 (2019). [Back]
[47] Chris Norval, Jennifer Cobbe, and Jatinder Singh. To Appear. Towards an ac-countable Internet of Things: A call for ‘reviewability’. In Privacy by Design for the Internet of Things: Building Accountability and Security. The Institution of Engineering and Technology. [Back]
[48] Government of Canada. 2019. Directive on Automated Decision Making. https: //www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 [Back]
[49] Government of Canada. 2020. Algorithmic Impact Assessment. https://open. canada.ca/aia- eia- js/?lang=en [Back]
[50] House of Common Science and Technology Committee. 2018. Algorithms in Decision-Making. Fourth Report of Session 2017-19 HC 351 (2018). [Back]
[51] Venice Commission of the Council of Europe. 2016. The Rule of Law Checklist. (2016). [Back]
[52] Information Commissioner’s Office. 2020. Guidance on the AI auditing framework. (2020). https://ico.org.uk/for- organisations/guide-to-data-protection/key-data-protection-themes/guidance-on-artificial-intelligence-and-data-protection [Back]

[53] Information Commissioner’s Office and The Alan Turing Institute. 2020. Explaining decisions made with AI. https://ico.org.uk/for- organisations/guide-to-data- protection/key-data-protection-themes/explaining-decisions-made-with-ai [Back]
[54] Partnership on AI. 2020. When AI Systems Fail: Introducing the AI Incident Database. https://www.partnershiponai.org/aiincidentdatabase/ [Back]
[55] Marion Oswald. 2018. Algorithm-assisted decision-making in the public sector: framing the issues using administrative law rules governing discretionary power. Philosophical Transactions of the Royal Society A 376 (2018). [Back]
[56] Partnership on AI. 2019. Annotation and Benchmarking on Understanding and Transparency of Machine learning Lifecycles (ABOUT ML). https://www.partnershiponai.org/wp-content/uploads/2019/07/ABOUT-ML- v0-Draft-Final.pdf [Back]
[57] Frank Pasquale. 2015. The Black Box Society: The Secret Algorithms That Control Money and Information . Harvard University Press. [Back]
[58] Anya ER Prince and Daniel Schwarcz. 2019. Proxy discrimination in the age of artificial intelligence and big data. Iowa L. Rev. 105 (2019), 1257. [Back]
[59] Douglas Pyper. 2020. Research Briefing: The Public Sector Equality Duty and Equality Impact Assessments. House of Commons Library (2020). https:// commonslibrary.parliament.uk/research-briefings/sn06591/ [Back]
[60] Inioluwa Deborah Radi, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. Fat* ’20: Proceedings of the 2020 Conference on Fairness, Accountabiltiy and Transparency (2020). [Back]
[61] Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. 2018. Algorithmic Impact Assessments: A practical framework for public agency accountability (AI Now). https://ainowinstitute.org/aiareport2018.pdf [Back]
[62] Sebastian Schelter, Joos-Hendrik Bose, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. 2017. Automatically tracking metadata and provenance of machine learning experiments. In ML Systems Workshop at NIPS. [Back]
[63] Nick Seaver. 2013. Knowing Algorithms. Paper presented at Media in Transition 8 (2013).[Back]
[64] Andrew D Selbst and Julia Powles. 2017. Meaningful information and the right to explanation. International Data Privacy Law 7 (2017). Issue 4. [Back]
[65] J. Singh and J. Cobbe. 2019. The Security Implications of Data Subject Rights. IEEE Security & Privacy 17, 6 (2019), 21-30. [Back]
[66] Jatinder Singh, Jennifer Cobbe, and Chris Norval. 2020. Decision Provenance: Harnessing Data Flow for Accountable Systems. IEEE Access 7 (2020) [Back].
[67] Jatinder Singh, Ian Walden, Jon Crowcroft, and Jean Bacon. 2016. Responsibility & Machine Learning: Part of a Process. Available on SSRN (2016). http://dx.doi. org/10.2139/ssrn.2860048 [Back]
[68] Cynthia Stohl, Michael Stohl, and Paul M Leonardi. 2016. Managing Opacity: Information Visibility and the Paradox of Transparency in the Digital Age. International Journal of Communication 10 (2016). [Back]
[69] Harini Suresh and John V Guttag. 2019. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 (2019). [Back]
[70] Joe Tomlinson, Katy Sheridan, and Adam Harkens. 2019. Proving Public Law Error in Automated Decision-Making Systems. https://papers.ssrn.com/sol3/ papers.cfm?abstract_id=3476657 [Back]
[71] UK Government. 2020. Guidelines for AI procurement. https://www.gov.uk/government/publications/guidelines-for-ai-procurement/guidelines-for-ai- procurement [Back]
[72] European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union L119 (4 May 2016), 1-88. [Back]
[73] Neil Vigdor. 2019. Apple Card Investigated After Gender Discrimination Complaints. New York Times (Nov 2019). https://www.nytimes.com/2019/11/10/ business/Apple-credit-card-investigation.html [Back]
[74] Sandra Wachter, Brent Mittelstadt, and Luciano Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7 (2017). Issue 2. [Back]
[75] Maranke Wieringa. 2020. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 1-18. [Back]
[76] Janis Wong and Tristan Henderson. 2018. How Portable is Portable? Exercising the GDPR’s Right to Data Portability. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers (Singapore, Singapore) (UbiComp ’18) . Association for Computing Machinery, New York, NY, USA, 911— -920. https://doi.org/10.1145/3267305.3274152 [Back]
[77] World Economic Forum. 2020. AI Procurement in a Box: AI Government Procurement Guidelines. http://www3.weforum.org/docs/WEF_AI_Procurement_ in_a_Box_AI_Government_Procurement_Guidelines_2020.pdf [Back]
[78] Monika Zalnieriute, Lyria Bennett Moses, and George Williams. 2019. The Rule of Law and Automation of Government Decision-Making. Modern Law Review (2019). https://www.modernlawreview.co.uk/may-2019/rule-law-automation-government-decision-making [Back]

Information
Derechos \| Equipo Nizkor

Mar21

Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems

Reviewable Automated Decision-Making: Framework for Accountable Algorithmic Systems

Reviewable Automated Decision-Making:
A Framework for Accountable Algorithmic Systems

Reviewable Automated Decision-Making:
Framework for Accountable Algorithmic Systems