The stages of the research lifecycle describe the key elements of the research process, from question identification through information sharing, recognizing the iterative nature of the research process

Four deep blue arrows arranged evenly and closely so they each point inward at a common point in the center. Stage One

Question and Problem Identification

This stage of the data science lifecycle, Question and Problem Identification, defines the research question to be addressed, ensuring the feasibility and scientific rigor of the project. A research question can emerge from multiple insights: a review of existing literature, direct observation, discussions, and interactions with stakeholder(s): communities, researchers, activists, government officials, corporate managers, etc. 

Lifecycle Stages are Interwoven

Activities in other stages of the lifecycle can cause researchers to revisit Question and Problem Identification in order to refine their research questions or spin off new research questions.

Considering this stage through the lenses will help to establish ethical boundaries of the Question and Problem Identification process.

Through the Lenses:

  • Researcher and Stakeholder

    How might your identity, experiences, and personal situation impact your understanding of the project? What about those of project stakeholders?

    Culture, Race, and Gender

    What cultural, race, and gender assumptions might the you and/or your stakeholders bring to the research question and project development?

    Knowledge and Skills

    What knowledge and skills do you bring to the definition of the research question? Are there others (stakeholders, researchers, etc.) with relevant knowledge and skills that could help define the research question?


    Does the project plan incorporate regular checks, discussions, and documentation about the ethical dimensions of the project?

  • Infrastructure

    What human and non-human infrastructure does the research project require? How are these choices made?


    What risks, benefits, and responsibilities do your choices of human and non-human infrastructure present? Is this the first time some of these social and technical systems have interacted? What are the implications?

  • Stakeholders

    Who are the apparent stakeholders for this project? Is it clear why and how these are project stakeholders? Are there individuals or groups who could be considered stakeholders who are currently not?


    To what degree were stakeholders included in developing the research question(s) and the research project, and why?


    Who is responsible for framing the research question and developing, funding, publishing, and reviewing the research, including its ethical dimensions? How does this delegation of responsibility impact the research and its ethical dimensions?

  • Dominant and Alternative Narratives

    What is the dominant narrative in which the research question is embedded? How does this impact the research question? Are there alternative narratives you could consider? How might these alternative narratives impact the research question?

    Social Justice and Public Good

    Does your research question include a social justice or public good component?  Why or why not? How could this social justice or public good framing impact the research project in the short and long term?

Deep blue geometric outlines of a triangle, circle and square, arranged in triangular format. Stage Two

Data Discovery

In the Data Discovery stage, researchers identify potential data sources, exclude irrelevant, analytically unfit, and ethically questionable data (Data Screening), then transform and integrate these data into a usable dataset (Data Cleaning) to support the next stage: Exploratory Data Analysis.

Lifecycle Stages are Interwoven

Issues arising during data collection, screening, and cleaning can cause researchers to review and refine their research questions, and exploratory analysis of the data can lead researchers to seek supplementary data.

The following questions help to establish ethical boundaries of the Data Discovery stage.

Through the Lenses:

  • Discovery by Whom?

    Is the person collecting the data also in charge of defining the research question? If not, why were the data originally collected and how does the positionality of the data collector(s) influence the quantity and quality of the data? If so, how might your positionality have influenced the data collection process?

  • Data Collection

    Where does the data for the project come from? What methods and systems did you rely on to access the data? Could you explain how the data were generated and who was responsible for generating the original data?

    Data Curation

    How have the data been curated? What data collection and recording systems did you use? What classification system(s) are you using for the data (e.g. for categorical data)?

    Data Types

    How might the types of data you are using have been impacted by the collection process? What populations, years, and geographic extents do the data encompass? How could the selection of these variables affect the research outcomes?

  • Data and Information Access

    What types of data were required for this project, and how accessible were they? What other types of resources does the project require, and do you have access to those resources?


  • Dominant and Alternative Narratives

    What is the dominant narrative in which the data collection process is embedded? How does this effect how data is collected? What alternative narratives could exist for the data collection process? Does the data collection involve a social justice or public good component? Why or why not?

A deep blue outline of a magnifying glass in front of set of 5 lines, which are an abstract rendering of stack of books, with the tallest line in the middle and lines that descend in height to the right and left. Stage Three

Exploratory Data Analysis

In the Exploratory Data Analysis stage, researchers investigate specific variables of interest in a dataset. This stage aims to validate whether the research question corresponds with the data collected during Data Discovery.

Lifecycle Stages are Interwoven

This stage can be quite iterative with the Data Discovery and Use of Analytical Tools (Modeling) stages, and may even influence the Question and Problem Identification stage.

The following questions help to establish ethical boundaries of the Exploratory Data Analysis process.

Through the Lenses:

  • Who Leads the Exploration?

    Who has input into the data analysis process? Is the person responsible for exploratory analysis the same person responsible for other phases of the lifecycle (e.g. data collection)? How does this distribution of responsibilities affect exploratory data analysis?

    Identifying Bias

    What mechanisms are in place to detect bias and false assumptions in exploratory data analysis?

  • Tool Selection

    How do the tools and methods selected for Exploratory Data Analysis influence your data exploration process? Were tools and methods selected out of convenience, habit, or the norms of your research domain? How does this affect data exploration? Are there tools available that were not used, and if so, why?

    Origins of the Data

    Do the data used at this stage originate from this research project, or are you using data from previous research projects? What types of translation, transformation, or processing has the data gone through, and how does this influence the exploration process?

  • Emergent Power Dynamics

    Can you see power dynamics emerging in the dataset? If so, how do they connect to the larger context of the research and to any communities or stakeholder groups?

    Considering Impact

    Could these dynamics harm or disadvantage a group or individual? Are those group(s) historically marginalized in your country or in others? What are the implications?

  • Emergent Relationships and Patterns

    What relationships or correlations become more visible as you explore the data, and how do these impact the development of narratives about the data and research question? Are there unexpected relationships in the data, and do they change or require refinement of the research question?

A deep blue abstract rendering of a cube made up of outlined strokes. The top of the cube is an outlined square, and the edges of the cube are conveyed through obtuse-angled line strokes. Stage Four

Use of Analytical Tools (Modeling)

In the Use of Analytical Tools (Modeling) stage, the researcher selects and implements analytical tools based on the research question, the intended utilization of the data to support the research hypothesis, and the assumptions required for a particular statistical method.

Lifecycle Stages are Interwoven

The Use of Analytical Tools (Modeling) stage can be deeply interactive with the Interpreting, Drawing Conclusions, and Making Predictions stage of the research lifecycle. Researchers may examine modeling results through methods presented in the Use of Analytical Tools, and some research domains may share early results of this stage through the Communication, Dissemination, and Decision Making stage.

The following questions help to establish ethical boundaries in the Use of Analytical Tools stage.

Through the Lenses:

  • Methods Selection

    How does your positionality or experience influence the types of methods and tools used for this project? What decisions led you to implement some methods at the exclusion of others?

    Model Choices and Implications

    If your research includes the development of a model, can you describe the model and its intended effects to those in your field? How about to a lay audience? Does changing the parameters of the model affect individual or group representation? What are the implications?

  • Tool Selection

    Are there any known concerns or limitations when using these tools for research similar to yours? How can these limitations be mitigated?

    Embedded Bias

    Are you using machine learning or artificial intelligence tools for this project? Can you detail the value that these tools add to the research project? What biases might be built into these tools and the dataset(s) they have been trained on?

  • Reproducibility

    How transparent and reproducible are your workflows for this stage? What strategies could you employ to make your workflows more transparent and reproducible? What power imbalances might be created by the level of transparency and reproducibility of this project?

  • Preexisting Narratives

    Do you have expectations for the results of the data analysis? How does your presupposition of results impact your analysis?

A deep blue curvy line shaped line a backward Stage Five

Interpreting, Drawing Conclusions, and Making Predictions

In the Interpreting, Drawing Conclusions, and Making Predictions stage, researchers distill the results of the Use of Analytical Tools (Modeling) stage, but also may consider the results of this stage in the context of many other stages of the lifecycle, and may involve explaining how a result came about or making predictions about how the research question might evolve.

Lifecycle Stages are Interwoven

Researchers may find themselves interacting with a number of stages of the lifecycle when at the Interpreting, Drawing Conclusions, and Making Predictions stage. For example, when interpreting research results, researchers often revisit the Exploratory Data Analysis and Use of Analytical Tools (Modeling) stages to augment interpretation of conclusions or making predictions, and interpretations of the research may ignite new research questions, taking the researcher to the Question and Problem Identification stage. Likewise, considering how and with whom the research will be shared in the Communication, Dissemination, and Decision Making stage may have impacts on how the interpretations, conclusions, and predictions are framed or worded.

The following questions help to establish ethical boundaries in the Interpreting, Drawing Conclusions, and Making Predictions stage.

Through the Lenses:

  • Storytelling

    What story are the r


    What story are the r


    What story are the results telling? How does your positionality, and those of your colleagues, impact the narratives being spun from this research?

    Dominant Narratives

    Do the results identify specific groups of people? How do these narratives refute or support to the dominant narratives told about these groups? What value is there in refuting or supporting dominant narratives?

  • Fit for Purpose

    Does the analytical model chosen to address the research questions support the research hypotheses and is it compatible with the type of data collected? Are the assumptions of the analytical model met reasonably well in the context of the data situation? Does the data coverage (representativeness) and quality meet the needs of the model(s)?

  • False Positives and Negatives

    What do false positive and false negative mean in the context of this research? How would false positives and false negatives affect the implementation of this work in predictive technology? Are marginalized groups impacted by false positives and false negatives in and outsized way? What are the implications?

  • Complimentary and Counter Narratives

    How do your interpretations of your results follow along with or run counter to existing narratives in this space? How do existing narratives affect your ability to suggest relevant counter narratives?

A deep blue abstract rendering of 2 people. Their heads are represented by outlined circles and their bodies are arced line strokes. One person sits slightly in front and to the right of the other. Stage Six

Communication, Dissemination, and Decision Making

The Communication, Dissemination, and Decision Making step involves communicating the results to the research community through conference presentations, journal articles, social media, and other communications venues. This stage also includes communicating results to decision makers, managers, the general public, and other stakeholders – an important element of the research process.

Lifecycle Stages are Interwoven

Often considered the “last” stage of the lifecycle, Communication, Dissemination, and Decision Making is often the stage at which researchers hear feedback from stakeholders and consider questions that might guide new research questions (i.e. Question and Problem Identification) or allow the researcher to consider alternative approaches to the Use of Analytical Tools (Modeling) stage.

The following questions help to establish ethical boundaries of the Communication, Dissemination, and Decision Making Stage. 

Through the Lenses:

  • Success Metrics

    Who considers the project a success and according to what measures or definitions? Were these measures of success clear from the beginning of the project?

    Social Commitments and Impacts

    What social commitments do the communicated results and reports support, if any? If any groups or communities are mentioned in the results, what is being said about them? Have you shared the results with the impacted communities, directly or indirectly?

  • Technical Decisions

    What technical decisions were crucial to the success of the project? Were there any technical decisions that detracted from the success of the project?


    Do resultant publications explain how the human contexts and ethical questions have been integral to the research process and results? What kinds of actions become possible now that the results have been disseminated? Are these all positive, all negative, or a mix?

  • Open Science

    Are the data, methods, code, documentation, analysis, and results shared openly or in repositories that preserve data integrity? If not, how does this affect your ability to fully communicate your research?

    Impacts on Stakeholders

    How could the disseminated results reconfigure the power of the stakeholders in this project? Has this impacted whether and how some or all of the results are presented?

  • Limitations and Misuse

    What are the limitations in generalizing the results of this research to other situations? What situations would represent apparent misuse of the data? How can these potential misuses be monitored into the future?