Paying Your Share?

A small white house set in a suburban street with 2 similar white houses flanking it. The law is green with 2 large trees in front of the house and several large trees in back. There is a hedge with lawn decorations in front of the house, as well.


This case study examines a major data investigation on a complex topic: the calculation of property taxes in Cook County, Illinois and New York City, New York. 

In this advanced case study, we will uncover the infrastructure of data production and its impact on the quality of the data and model. Following the journalist’s investigations, we will discuss the short and long-term consequences of the journalists’ intervention in each of the cities and on the concerned residents.

More advanced learners can read the methodology material provided by the journalists that point to the difference in framing the question and collecting data between the local administration and the team of journalists, discuss the data science methodology used by the journalists, and work on code review using the public Jupyter notebooks.

Question and Problem Identification

This stage reflects on the research question or problem and seeks to identify the factors contributing to its definition. In this first stage, we examine the origin of the data journalist’s investigation and how different lenses have played a role in defining the issue and the angle of the investigation. Even though the issue is the same in Cook County and New York City, stakeholders, narratives, and sociotechnical dimensions differ. 

At this stage of the research process, what would be considered an ethical approach to the research for each particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate these options?

Looking through conceptual lenses:

  • What is the origin story behind the data investigation? What were the journalist’s hypotheses? How does the positionality of journalists and other stakeholders impact their understanding of the problem? 

    What knowledge and skills did the journalists bring to the definition of the research question? Are there any missing pieces?

    What are the main ethical challenges of this data investigation? How are ethical questions and insights incorporated into the investigation? 

  • What sociotechnical system supported the definition of data investigation? How are these sociotechnical systems assembled, and how could they impact the definition of the project? Could these systems be biased, and if so, how?

  • Who are the main, visible, and apparent stakeholders in this data investigation? Are there individuals or groups who could be considered stakeholders and who are not represented?

    To what degree was the public included in developing the research question(s) and the research project, and why? What are the implications of this level of involvement (or lack thereof)?

  • How are stakeholder’s narratives shared in the investigation? Does any dominant narrative emerge? How does this dominant narrative impact the research question? What alternative narratives are or could be considered? 

    Does the research question include a social justice or public-good component?  How this social justice or public-good framing impact the investigation in the short and long term?

Data Discovery

During this stage, the journalists identify potential data sources, excluding irrelevant, analytically unfit, and ethically questionable data (Data Screening), then transform and integrate the “good” data into a usable dataset (Data Cleaning) to support the Exploratory Data Analysis process. In this section, we examine how the researchers conducted the discovery process: an essential part of the investigation relied on identifying data sets able to propose an alternative framework for calculating local property taxes. 

At this stage of the research process, what would be considered an ethical approach to the research for each particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate these options?

Looking through conceptual lenses:

  • What do you know about the data collection process in the case of the Cook County and New York investigations? Which pre-existing data sources were used, or which data were created?

    How has the positionality of the data collector(s) influenced the quantity and quality of the data and the data collection process? How might positionality have affected access to existing datasets?

  • What methods and tools did the different stakeholders of the investigations use to collect and store the data? What influence might these systems have had on the quality-quantity of data? 

    How did the different stakeholders of the investigation curate the data? How might have their methods influenced their results?

  • What data types were required for this project, and how accessible were they?

    What other kinds of resources did the project require? Did the journalist have access to those resources?

    In this context, from which source did the stakeholders draw their power? How did the data discovery process change the power relations among stakeholders?

  • What is the dominant narrative in which the data collection process is embedded? What are some possible alternative narratives? What differences could you see between the New York and the Cook County cases?

Exploratory Data Analysis

Exploratory Data Analysis investigates specific variables of interest in a dataset. This stage aims to validate the correspondence between the definition of the research question and the data collection process. The researcher explores the data sets and gets preliminary insights. What information can you get from the methodology paper to understand better how the analysis has been conducted?

At this stage of the research process, what would be considered an ethical approach to the research for each particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate these options?

Looking through conceptual lenses:

  • How did the positionality of the researcher influence the exploratory analysis phase? In the investigation, what mechanisms were used to detect bias and false assumptions in exploratory data analysis?

    Looking at the stakeholders involved, could you explain how they all have different positionality in this project phase?

  • What are the sociotechnical system that is mobilized in this phase of the project? Why are sociotechnical system critical in this phase of the project? How do they create a different way to look at data/results from the stakeholders?

    What sociotechnical systems influenced both the stakeholder and the result of the investigation in this stage of the project? What analytical biases have these sociotechnical systems induced during the exploratory data analysis?

  • What are the power dynamics that are important to consider in this phase of the project?

    How did this phase of the exploratory data analysis transform the power dynamic between the stakeholders? Can you see any difference between the Cook County and New York cases?

  • What narratives have been developed by the stakeholders around this data analytical phase? How is the narrative produced during the previous stage influencing this new one? 

    How are the others less used to developing different stakeholders’ narratives in this phase? 

Use of Analytical Tools (Modeling)

The appropriate Use of Analytical Tools (Modeling) depends on the research question, the intended utilization of the data to support the research hypothesis, and the assumptions required for a particular statistical method.

At this stage of the research process, what would be considered an ethical approach to the research through this particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate their options?

Looking through conceptual lenses:

  • How does the positionality of the stakeholders influence the methods and tools used during the modeling phase? Does every stakeholder have access to the same resources? How does each of them mobilize their specific access to resources?

    How does each stakeholder define modeling?  How do these differences impact this stage of the researcher process?

  • What are the stakeholder’s guiding principles for selecting modeling tools and methods? Could you explain their motivation and the context of their decision? How are these decisions either empowering or limiting in this research stage?

    How would you describe the sociotechnical system in place: agile and flexible, autonomous or controlled, multimodal or centralized? Something else? How does that impact the model?

  • Is the model independent of the power dimensions of the research project? How is the model dealing with questions of reproducibility and transparency? Is the model challenging the power status quo around the problem, and how?

    How do stakeholders apprehend the power of the model? What quality and down side are attributed to the model?

  • How do stakeholders articulate the modeling part of the investigation in their current narrative? Is the act of the modeling creating yet another narrative? What are the implications of creating these new narratives?

Interpreting, Drawing Conclusions, and Making Predictions

This step refers to analyzing the results to create scientifically robust knowledge. Interpreting, Drawing Conclusions, and Making Predictions requires qualitative interpretation. This stage may involve explaining how a situation came about or making predictions about how the research question might evolve.

At this stage of the research process, what would be considered an ethical approach to the research for each particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate these options?

Looking through conceptual lenses:

  • How does the positionality of the stakeholders influence the methods and tools used during the modeling phase? Does every stakeholder have access to the same resources? How does each of them mobilize their specific access to resources?

    How does each stakeholder define modeling?  How do these differences impact this stage of the researcher process?

  • Is the interpretative framework to choose to observe the result of the modeling support the research hypotheses and is compatible with the type of data collected?

  • What do false positives and false negatives mean in the context of this research? How would false positives and negatives affect the implementation of this work in predictive technology?

    How interpretation of the model could serve or be detrimental to certain stakeholders. Could you identify power dynamic in the divergent interpretive framework?

  • Who are the stakeholders advocating different interpretations of the data? How does this create narratives about the results?

    Could you identify competing narratives emerging from the interpretation phase? How would you describe these competing narratives? What are the upsides and downsides of competing narratives for this research project?

Communication, Dissemination, and Decision Making

This step involves communicating the results to the research team and the broader public through conference presentations, journal articles, or social media.

At this stage of the research process, what would be considered an ethical approach to the research for each particular lens? What could be considered an unethical approach? How did the researchers and stakeholders navigate these options?

Looking through conceptual lenses:

  • In this particular project, who has been sharing the results, and with what purpose? Does the positionality of the stakeholders sharing the results influence the understanding and reception of the research?

    What specific strategies were used by the journalist to increase the impact of their research? How are these resulting impacts the public, directly or indirectly? 

  • What media, platforms, and venues have been mobilized to share the results of this research? How do these media, platforms, and venues impact the understanding and reception of the research?

    What potential biases or misconceptions can emerge from using these media, platforms, and venues? How did the journalists respond to that?

  • What were the risks and advantages of sharing the data or the results of the investigations? Can you identify competing interests? What situations would represent apparent misuse of the data and results of the research?

    Who has benefited from this investigation, and who has not? Could you describe how this work challenges the current status quo?

    Why was it necessary to share the methodology and the data in this context? Could you elaborate on the risk of sharing versus not sharing the data? What ethical values might have guided the journalist’s decision?

    How could the disseminated results reconfigure the power of the stakeholders? 

  • How was this last part of the life cycle folded into the general narrative of the investigation? How are divergent narratives represented in this investigation? Could you retrace the evolutions in the narratives since the first articles?

    Who benefits from these narratives, and who does not? What new narrative emerged from this investigation? How do they shape the public discourse, and what impact do they have?

    What new research questions could these narratives open up for the future?

The Full Case Study

Should five percent appear too small
Be thankful I don’t take it all
‘Cause I’m the taxman
The Beatles, 1966


On November 16th, 2022, the United Jewish Organizations (UJO) of Williamsburg and North Brooklyn published on its Twitter account a video of Lincoln Restler, member of the New York City Council for the 33rd District at the New York City Council hearing addressing Preston Niblack, the commissioner of the Department of Finance appointed by Mayor Eric Adams in December 2021. 

In his brief address, Lincoln Restler stated: “Homeowners in Williamsburg have been paying a disproportionate share of property taxes for years. Because of the distorted ways we compare condo taxes to rentals, the  UJO analysis has shown that the New York Department of Finance has imposed property taxes three times as high on condos in South Williamsburg compared to the values of other similar homes. These condos do not have any amenities and are not like the luxury condo on the North side of Williamsburg, but they get compared to similar rentals and housing stock up here. What steps can the Department of Finance take right now to help homeowners of South Williamsburg condominiums paying extremely high tax rates relative to what they should?” 

Lincoln Restler was politely channeling frustration with the Department of Finance represented that day by Preston Niblack, who, in a request for more transparency in the modality of the definition of local taxes in the New York City Borough, had presented these heavily sanitized documents. 

“After many rounds of back and forth, the Department of Finance released data that was more blackout than … well…. I will not make a comparison to Trump… But it was profoundly blacked out. I do not understand why the Department of Finance will not release the full formula that explains how you are getting to the tax outcome you are. Taxpayers should fully understand how their property is being taxed and why. To make it some absurd argument of proprietary information undermines transparency and accountability for why people are being taxed.” 

Visibly embarrassed, Preston Niblack, who, according to his LinkedIn profile, holds a Ph.D. in Public Finance from the University of Maryland, replied: “Unfortunately, what is in blackout from the request were screens from the software that the vendor use as part of the mass assessment. The vendor declined to allow us to share them. They are proprietary, and I cannot legally do anything about that. To the extent that anybody can understand the process, it was all clearly described in what was provided, but again I am happy to walk you through it. It is not a straightforward process; I will explain how it works as best as I can.”

This argument did not satisfy Restler, and for some good reasons. Since 2017, a group of data journalists raised the question of city tax in a series of data-intensive investigations, shedding light on the total opacity of tax assessment in at least two cities in the United States. 

Jason Grotto, the journalist who started these investigations, with colleagues from ProPublica and later Bloomberg, has been collecting and reassessing tax data in Cook County (where 40% of all residents of Illinois live) and, more recently, in New York City. His work has had a considerable impact, and together with his colleague from ProPublica Sandhya Kambhampati, he was nominated as a Pulitzer Prize Finalist for Local Reporting in 2018. 

The Tax Divide Project

In the United States, taxpayers must pay personal income tax to the federal government, 43 states, and many local municipalities. They also pay property taxes whose rates are inconsistent across the country: cities, counties, and school districts are responsible for defining the ad valorem system of taxes and collecting what will later provide services to their communities. The assessed value for any given property is generally based on a percentage of this property’s Market Value, known as the Level of Assessment or Assessment Ratio. To calculate property taxes, charges are added together to calculate the total tax rate for an area, known as the rate per mill. This rate is then multiplied by a property’s Assessed Value, so people with more valuable properties should pay more. Most cities use the Market Value to assess a home in the United States. The Cost Method (how much it would cost to replace the property) and the Income Method (how much income you could make from the property if it were rented) are alternative methods. The information that the assessor has is considered part of the public record.

The assessment is typically performed by city officials, in the case of Cook County, the  Assessor Office, and in the case of NYC, the Department of Finance. In Cook County, the assessor is an elected official responsible for “setting fair and accurate values for 1.8 million parcels of Cook County property”. In Cook County – the most populous county in the U.S. state of Illinois and the second-most-populous county in the United States – the Tax Foundation estimates that the average property tax was 5,342 USD in 2022, on the higher end of the tax spectrum. 

This is where the “Tax Divide” project started in 2017. In a series of articles first published by the Chicago Tribune and ProPublica Illinois, Jason Grotto and Sandhya Kambhampati focused on property tax assessments that have long been a source of controversy and political turmoil, especially for commercial and industrial properties. The articles identified a regressive taxation system, i.e., a tax taking a more significant percentage of income from low-income earners than from middle and high-income earners, as opposed to a progressive tax which would take a more substantial portion from high-income earners. 

On June 10th, 2017, Jason Grotto and his colleagues published five articles under “Tribune Watchdog” in the Chicago Tribune. By scrutinizing large amounts of data, the journalists’ analysis uncovered a messy and corrupt system profiting some constituents while putting enormous and unjustifiable financial pressure on more modest families and businesses. The data analysis and subsequent story pointed out the administration’s unfairness and corruption while using the language of community engagement to give the system a veneer of fairness. They also identified a key reason for the regressivity of the assessment system:  the appeals process, which, ironically, promised to add a “human” and “public-facing” element and be a corrective mechanism to the assessments. 

  • Part I, “An Unfair Burden,” documented how the county’s property tax system for years handed substantial financial breaks to well-off homeowners while punishing those who have far less, particularly those living in non-white communities.
  • Part II, “The Problem With Appeals,” showed how property tax appeals — which have been called an insiders’ game that favors the well-heeled and well-connected — made the tax system even less fair.
  • Part III, “Decades of Errors,” revealed that the assessor’s office knowingly produced inaccurate property assessments during the long tenure of Berrios’ predecessor, James Houlihan, and even as far back as the 1980s.
  • Part IV, “Commercial Breakdown,” showed how assessments of commercial properties were so riddled with errors that they created deep inequities, punishing small businesses while cutting a break to owners of high-value properties and helping fuel a cottage industry of politically powerful tax attorneys.

In December 2017, ProPublica also published the story in an article co-signed by Sandhya Kambhampati and Jason Grotto:  “How the Cook County Assessor Failed Taxpayers Joseph Berrios’ error-ridden commercial and industrial assessments punish property owners, benefits lawyers.” The article summarizes the many takeaways of the research: the resistivity of the tax system and the systemic discrimination of Black residents, which subsequent studies confirmed

Shortly after the publication of these articles, in March 2018, a new Cook County Assessor, Fritz Kaegi, was elected and made fairness, ethics, and transparency the cornerstone of his work. Fast forward, Cook County is now considered a model for tax assessment and has won numerous awards and accolades, including the Center for Digital Government County Project Experience Winner of the International Association of Assessing Officers (IAAO) and the James A. Howze, CAE, Distinguished Research and Development Award. 

An Unfair Burden

The regressive system had significant consequences for Cook County residents and business owners. Many Black homeowners purchased affordable houses and saw their property appraised at levels far above what they paid. While, in theory, every homeowner can contest their tax assessment every three years, the Chicago Tribune and ProPublica investigation made clear that only residents with the means to hire tax lawyers would succeed. Wealthier, Whiter neighborhoods appealed to and won reductions more often than their less affluent neighbors. 

The situation replicated much older patterns of racially discriminatory housing practices in Chicago, regularly cited as the most segregated city in the United States. In the United States, housing and real estate have been one of the most significant and enduring drivers of structural racism and racial inequality since the Civil War. While segregationist policies forbade African-Americans from utilizing the same facilities as White (such as buses, bathrooms, and pools), neighborhoods have been deliberately segregated through real estate and government overlapping practices. 

The ProPublica article focused on real cases to show the extreme variation in assessments and how ongoing practices were particularly unfair to Black home and business owners. 

“THE OWNERS OF SWEET PEA ACADEMY, a daycare in Chicago’s Auburn Gresham neighborhood, knew as soon as they received the first assessment notice for their one-story building that something was off. Brenda and Larry Doyle, who started the daycare with their daughter Jamilah, bought the building in 2015 for $205,000. When they received their first notice from the assessor a month later, the property’s value was pegged at $324,700. “It is ridiculous,” Larry Doyle said. “There are a lot of businesses in the area that has the same thing happening, and we are all pissed off about it.” Farther west, on West 79th Street, the owner of a pest control business purchased a small storefront in 2012 for $60,000. The assessor valued it at $111,028 that year. And when 2015 rolled around, the value did not budge. Meanwhile, the owners of an office tower at 300 N. LaSalle Drive got much better news from the assessor. The building along the Chicago River had sold in 2014 for $850 million — at the time, the highest single-building office sale in Chicago history. A year later, the assessor valued the building at just $392 million, less than half the sale price. “

According to the Cook County Assessor Office website, racial equity is now actively discussed and monitored by Kaegi and his team. The website highlights the values on which the new assessment system has been developed: fairness, ethics, and transparency. It also mentioned creating a “Racial Equity in Real Estate Conversations” series in 2020:  “Some activities will be exclusive to staff, while others are open to the public. Our program will include book readings, film screenings, and panel discussions with historians, practitioners, and authors who are knowledgeable about housing, real estate, and racial equity.” 

Investigation Data and Methods

The Chicago Tribune Tax Divide project was a massive data science effort to recreate the complex web of interactions between the assessment of the data using the Illinois Department of Revenue (IDOR) real estate transfer declaration data, the Cook County assessor’s office (CCAO) assessment data, Geographical data, and CCAO appeals data. These data have been published in journal articles supported by Methodology papers and a public Github repository

For ProPublica, Kambhampati and Grotto released documentation that helps explain the methodology. For this particular topic, ProPublica Illinois and the Chicago Tribune conducted three separate analyses:

  • An examination of unchanged assessor’s initial valuations over multiple reassessment periods
  • A sales ratio study comparing the assessor’s valuations to actual sales prices
  • A look at appeals of commercial and industrial assessments in Cook County

You can read the full methodology, download the data and see the code used for the analyses from ProPublica Data Store.

The following paragraph and results are excerpted from How We Analyzed Commercial and Industrial Property Assessments in Chicago and Cook County. An in-depth analysis of hundreds of thousands of property tax records under the Cook County Assessor, Joseph Berrios. 

These analyses examined non-incentive commercial and industrial property assessments produced under Cook County Assessor Joseph Berrios from 2011 to 2015. For the first analysis, the team also looked at 2003, 2006, and 2009 Chicago reassessments under the previous assessor, James Houlihan. Among the results: 

  • For thousands of Chicago parcels, the first-pass values produced by the assessor’s office under Berrios did not change over multiple reassessments. Experts say this would be nearly impossible if the office used valid appraisal models and did the work. Under Berrios, 67 percent of first-pass values were identical over two reassessments, and 23 percent were identical over three reassessments – 2009, 2012, and 2015. Under Houlihan, just 1 percent of first-pass values were the same over multiple reassessments during the years examined in the study. 
  • Berrios’ assessments of commercial and industrial properties in Cook County showed high rates of errors that far exceed industry standards. The assessor’s office also overvalued lower-priced properties while undervaluing higher-priced ones, resulting in inequities. 
  • Appeals were granted so frequently that many Cook County property owners who did not appeal likely paid more in taxes than they would have if they had appealed. In addition, assessments remained error-ridden and unfair even after the appeals process was complete. These analyses were reviewed by Richard Almy, former executive director of the International Association of Assessing Officers (IAAO), and ProPublica data reporter Hannah Fresques.

Using data science to fix the system

In 2018, Fritz Kaegi became a new assessor for Cook County with the mandate to re-make the troubled system. Kaegi created a new data science office headed by Rob Ross, a data scientist who had been part of the University of Chicago team that helped with the Chicago Tribune story. Kaegi and Ross envisioned designing a new sociotechnical system that was not just an algorithm but a new set of relationships between the office and the wider public to try to repair these relationships. They aspired to

  1. Create a more accurate algorithmic model for assessing property values (one that could predict prices more accurately than the previous model) 
  2. Design a better assessment system to address the historical research questions of the Cook County Assessor’s Office (CCAO). 

To do this, Kaegi and Ross implemented a Cook County open data hub on GitLab (an open-source online platform for software development), where they publish the code for their model and the data used to train the workflows and standards.

As Rob Ross explained in a Medium article, the open data hub was supposed “to allow journalists, academics, and the public to monitor our performance and even suggest changes to our methods that can improve our accuracy.” Transparency and civic technology, as Ross put it, is “a two-way street”: it enables the public to see into the system and recommend ways to improve the model. 

In addition to creating the open data hub, Kaegi and Ross implemented more sophisticated machine learning techniques to develop and test models that could predict the sale price (known as the “fair market value”) of unsold properties. Prior algorithms had been regressive, in part because they were too broad. They assigned average values to evaluate all home prices across the Cook County area without regard for minor differences, data changes, and neighborhood differences. The more sophisticated machine learning technique generated multiple models to test whether one model was more appropriate to a neighborhood than another. It aspired to be more granular and sensitive to all variations, communities, and geography. 

So what about New York?

The next iteration of this research happened in NYC and culminated with an extended article and data visualization publication:  “How a $2 Million Condo in Brooklyn Ends Up With a $157 Tax Bill” and a companion methodology paper. After Cook County, Grotto and colleague Caleb Melby from Bloomberg started an extensive investigation on the same topic in the New York City Area. 

As the investigation recalled, the situation in New York City was complicated by the fact that “For years, when confronted with complaints of uneven property taxes, the New York City Department of Finance has blamed a state law that requires it to ignore the sale prices of condos and co-ops when determining their taxable value. Instead, the law requires city assessors to engage in a thought experiment: Pretend co-ops and condos produce income for their owners—even though they don’t—and set their taxable values based on a hypothetical amount of income they’d generate if they did.” And when not blaming the state law, they blame vendor proprietary software, which infuriated Lincoln Restler. 

The problem had been identified at least since 2018 by UJO of Williamsburg, which on October 15, 2018, addressed to the NYC Advisory Commission on Property Tax Reform a “Report calling to address the unjust impacts of NYC-DOF adjustments to the valuations of condos in Central Williamsburg and its vicinity.” 

The Bloomberg analysis conducted by Grotto and Melby on millions of city records confirmed that “New York’s assessment process combines to help perpetuate unfairness.” In an article recapitulating the five takeaways of this new research, Grotto and Melby stated that as of 2021, flawed valuations have contributed to inaccurate and unfair property taxes for condos, shifting hundreds of millions of dollars in tax burden from the most expensive owners of rental properties and their tenants. They also noted that the owners of lower-priced apartment buildings pay, on average, a higher share of the property tax burden than condo owners relative to price. 

The journalists also pointed out that during their investigation, officials have constantly blamed a state law for the errors and inequities. This law requires assessors to value condos and co-ops as if they produce income (remember the Income Method?), as rental properties do. Because many condos and co-ops don’t produce income, officials have developed a process using data from comparable rental properties to form hypothetical income estimates. As officials point out, the state law’s requirements lay the groundwork for an unfair system. Still, the city has worsened a bad situation with an opaque system relying on data that departs market realities. 

Finally, the city’s method for estimating the hypothetical income for condos drives the disparities in valuations. City officials adjust their comparable figures, but the resulting amounts tend to be comparatively high for low-priced properties and unfairly low for high-priced ones.

The situation is still ongoing….