Course curriculum
Sociotechnical systems abound in examples of the ways they constitute sources of harm for historically marginalized groups. In this context, the field of machine learning has seen a rapid proliferation of new machine learning methods, model architectures, and optimization techniques. Yet, data -- which remains the backbone of machine learning research and development -- has received comparatively little research attention. My research hypothesis is that focusing exclusively on the content of training datasets — the data used for algorithms to “learn” associations — only captures part of the problem. Instead, we should identify the historical and conceptual conditions which unveil the modes of dataset construction. I propose here an analysis of datasets from the perspective of three techniques of interpretation: genealogy, problematization, and hermeneutics. First, genealogy investigates how datasets have been created and the contextual and contingent conditions of their creation. This includes questions on the role of data provenance, the conceptualization and operationalization of the categories which structure these datasets (e.g. the labels which are applied to images), methods for annotation, the consent regimes of the data authors and data subjects, and stakeholders and other related institutional logics. Second, the technique of problematization builds on the genealogical question by asking: what are the central discourses, questions, concepts, and values which constitute themselves as the solution to problems in the construction of a given dataset. Third, building on the previous two lines of inquiry, we have the hermeneutical approach, which is concerned with investigating the explicit and implicit motivations of all present and absent stakeholders (including data scientists and dataset curators) and the background assumptions operative in dataset construction.
-
1
ImageNet and its Discontents. The Case for Responsible Interpretation in ML
-
ImageNet and its Discontents. The Case for Responsible Interpretation in ML
-
Instructor
Applied Data Ethicist | Visiting Researcher Google
Razvan Amironesei, PhD