4. The Just City: Machine Learning’s Social and Political Foundations
In 2012, while reviewing reports of recent crimes, a data analyst for the Cambridge, Massachusetts, Police Department (CPD) noticed a striking pattern of thefts: laptops and purses were repeatedly being stolen on Tuesday and Thursday afternoons at a local café. While no incident on its own would have indicated much, the full set presented a clear case of a thief acting systematically. Having determined this pattern of behavior, the analyst could predict when and where the thief would strike next—and catch them in the act.
“We provided the detectives with Tuesday afternoon between four to six as the best timeframe,” recalls Lieutenant Dan Wagner, the commanding officer of CPD’s Crime Analysis Unit. “The detectives sent a decoy—their intern—with a backpack and a computer hanging out of it. They were there a short while, and they see the guy steal the laptop and make an arrest.”1
It sounds straightforward, but such patterns typically go undetected as analysts struggle to find patterns within large databases of crimes. Indeed, it took several weeks for CPD to identify the café crime series, and the identification was possible only because the analyst happened to remember having seen records of similar crimes when new thefts were reported. Despite being glad that the café thief was stopped, Wagner realized that this ad hoc, individualized approach was quite limited. After all, he explains, “No crime analyst can truly memorize and recall a full historical database of crimes.”
CPD’s Crime Analysis Unit was founded in 1978 as one of the nation’s first such teams by Wagner’s mentor Rich Sevieri, who has overseen the unit’s transformation from analyzing crime using pin maps and punch cards to databases and predictive models. Sevieri began his career as a journalist, and despite multiple decades enmeshed in data he still focuses on “the five Ws”: who, what, when, where, and why. Even as policing becomes increasingly focused on data and algorithms, Sevieri’s analytical approach remains the same: “You have to know the motivation and you have to know the scenario around the crime.”
Recognizing the futility of relying on an analyst performing manual database queries to find crime patterns, Sevieri and Wagner approached Cynthia Rudin, a professor of statistics at MIT, hoping for a way to automatically analyze crime trends. Although incidents within a crime series are known to follow patterns based on the offender’s modus operandi (MO), identifying these patterns is difficult for a human or computer working alone. Crime analysts intuitively sense what characteristics might indicate that a string of crimes are connected, but they cannot manually review data of every past crime to find patterns. Conversely, computers are proficient at parsing large sets of data, but they may not recognize the subtle connections that indicate a crime series.
Sevieri believed that if they could teach a machine “the process of how an old-time analyst worked,” an algorithm could detect patterns of crimes that would otherwise take the police weeks or months to catch, if they could be detected at all. It would enable the Cambridge Police to more quickly identify crime patterns and stop the offenders.
When Rudin learned how CPD had stopped the series of café thefts, she recognized that “finding this pattern was like finding a needle in a haystack.”2 An expert in designing computational systems to assist human decision making, Rudin was eager to help CPD sort through more haystacks. She and her doctoral student Tong Wang began working with Wagner and Sevieri to develop an algorithm that could detect crime series patterns among residential burglaries (an offense notoriously difficult to solve3).
At the heart of the algorithm was a focus on identifying the MO of offenders. For a crime like burglary, perpetrators typically exhibit a particular pattern of behavior that carries across multiple incidents. When presented with the right data, Wagner and Sevieri say that they tend to see such patterns “automatically.” But they may not know where to look, and can process only a limited amount of data. What makes identifying these patterns difficult for a computer, on the other hand, is that every offender’s MO is unique. Some people force open front doors of apartments on weekday mornings; others break in through a window and ransack houses on Saturday nights. Thus, instead of just teaching a computer to search for a particular pattern, the algorithm had to “capture the intuition of a crime analyst” and re-create it on a larger scale.4
With significant input from Wagner and Sevieri, Rudin and Wang developed a model that analyzes crime series in two complementary stages. First, the model learns “pattern-general” similarities, representing the broad types of patterns that are typical of crime series (for example, proximity in space and time). Second, the model uses this knowledge to identify “pattern-specific” similarities—in other words, the MO of a particular crime series.5 In this two-pronged approach, the model first learns the intuitions that a human analyst would follow and then applies them to a large database of housebreaks to detect crime series.
The last ingredient that the algorithm needed was a corpus of historical data from which to learn what a typical crime series actually looks like. Because of Sevieri’s stewardship, the Cambridge Police Department has one of the country’s most extensive databases of crime records, with detailed information about crimes over the past several decades. This data enabled the MIT model to learn from 7,000 housebreaks that occurred in Cambridge over a fifteen-year period, with details about each burglary such as the geographic location, day of the week, means of entry, and whether the house was ransacked. The algorithm also drew from fifty-one analyst-identified crime series during this period, gaining insights about what makes a crime series stand out.
Once it had learned from the data, the model quickly demonstrated its potential to help police investigate crime. In 2012, there were 420 residential housebreaks in Cambridge. The first time the team ran the algorithm, the model identified a past crime series that had taken the Crime Analysis Unit more than six months to detect.
In a retrospective analysis, the algorithm further demonstrated its ability to inform police investigations and draw inferences that would not have occurred to a human analyst. The Cambridge police had previously identified two crime series that took place between November 2006 and March 2007. When the algorithm analyzed past crimes, however, it determined that these supposedly separate crime series were in fact connected. Despite a month-long gap and a shift several blocks north in the middle of this crime series—leading the Cambridge police to suspect that the two sets of events were separate—the MOs of the perpetrators were otherwise similar: almost every incident involved forcible entry through the front door during the workday. When Wagner and Sevieri were presented with the algorithm’s assertion that these two sets of burglaries were actually connected as one crime series, they recognized that it was right. The time lag that occurred halfway through the series could be explained by the deterrence effect of more people being home during the winter holidays; the geographic shift was a response by the perpetrators to having been observed while carrying out a prior burglary.6 Had CPD possessed this information at the time, the police could have identified and addressed the emerging crime series before it expanded further. “If you don’t stop that series early,” reflects Sevieri, “that’s what happens.”
* * *
The Cambridge algorithm draws on a set of techniques known as machine learning. These methods of predictive analytics are powerful because they can mine large datasets and examine complex trends, identifying patterns that a human investigator would struggle to uncover. As the amount of data generated and stored grows exponentially, the ability to make informed decisions using this data becomes increasingly valuable.
Consider the way that Gmail monitors your incoming emails to detect spam. Every time you receive an email, Gmail evaluates the message to determine whether it is legitimate (and should be sent to your inbox) or spam (and should be sent to your spam folder). While an engineer could predetermine specific rules that characterize spam, such as the presence of the phrase “Limited time offer” and at least two spelling errors, machine learning algorithms can analyze emails from the past to detect more subtle and complex patterns that indicate whether an email is spam.
Typical machine learning algorithms rely on “training data” composed of historical samples that have already been classified into categories. For a spam filter, the training data is a corpus of emails that have previously been labeled by people as “spam” or “not spam.” The next step for Gmail’s engineers is to define each email’s attributes, known as features, that the algorithm should consider when evaluating whether an email is spam. Relevant features in this case may be the email’s words, the address from which the email was sent (e.g., is it in the recipient’s contact lists?), and the type of punctuation used. Gmail then uses a machine learning algorithm to characterize the relationship between the features and the two labels. Through a process of mathematical optimization known as “fitting,” the algorithm determines how strongly each feature corresponds to spam messages. It thereby generates a formula, known as a model, that can classify new examples. Every time you receive an email, Gmail applies what it has learned. The model evaluates the email to determine whether it more closely resembles the spam or the not-spam examples from the training data, and in turn gauges how likely the email is to be legitimate.
Spam filters are just the tip of the machine learning iceberg, of course. Machine learning algorithms are behind the software that drives cars, beats world champions at games like chess and poker, and recognizes faces.
The abilities of machine learning to make sense of complex patterns and predict hitherto inscrutable events suggest to many that we can rely on data and algorithms to solve almost any problem. Such thinking led the entrepreneur and Wired editor Chris Anderson to proclaim in 2008 that Big Data represents “the end of theory.”7 Who needs to understand a phenomenon when there is enough data to predict what will happen anyway?
But as we saw in Cambridge, this claim could not be further from the truth. Sevieri talks about striving to “capture the intuition of a crime analyst” because the algorithm must understand the theories that a human analyst would use to inspect crime patterns. Of course, the algorithm need not follow the exact same thought process: much of machine learning’s power comes from its ability to interpret large datasets using different methods than those used by people. But unless the model is provided with a basic framework for how to operate—such as what information to consider and what the goals are—it will flounder. MIT’s model succeeded precisely because Rudin and Wang relied on Wagner and Sevieri’s expertise to decipher the art of how a crime analyst thinks. The algorithm incorporated assumptions about what information is relevant (the day of the week, but not the temperature) and how to interpret that information (two incidents that occur on Tuesday and Wednesday are more likely to be connected than two incidents that occur on a Friday and a Saturday).
Although it may appear that data-driven algorithms do not rely on theories or assumptions about the world, in reality algorithms always reflect the beliefs, priorities, and design choices of their creators. This is true even for a spam filter. The process starts when Gmail’s engineers select training data from which the algorithm can learn. To ensure that the algorithm learns rules that apply accurately to every type of email correspondence it will see, the emails in the training data must be accurately labeled and representative of the emails that the spam filter will evaluate in the future. If Gmail selects training data in which spam emails are overrepresented, its spam filter will overestimate the likelihood that an email is spam. Furthermore, selecting features for the algorithm requires some intuition regarding what attributes of an email are likely to distinguish spam. If Gmail’s engineers know about one indicator of spam emails but not another, they might generate features that will capture spam emails like those they have seen but not other types.
Finally, Gmail must determine the goal that its spam filter will be optimized to achieve. Should it strive to capture every spam email, or are some types of emails more important to get right than others? If Gmail’s engineers decide that phishing emails (which attempt to trick recipients into providing sensitive information such as passwords) are the worst type of spam, it can optimize its filter to catch them—but then the filter might be less able to identify other types of spam, such as emails about payday loans. As part of this calculation, Gmail must consider the trade-offs between false positives (marking a legitimate email as spam) and false negatives (allowing a spam email into your inbox). Focus too heavily on avoiding false positives, and Gmail inboxes will become cluttered with spam; focus too heavily on avoiding false negatives, on the other hand, and Gmail may wrongly filter out important messages. This decision can make or break a model: the March 2018 collision in which a self-driving Uber car struck and killed a woman in Arizona was the result of software that was tuned to diminish the importance of false positives (to avoid overreacting to obstacles such as plastic bags).8
When we are not thoughtful about these design choices, we risk unleashing algorithms that make inaccurate or unfair decisions. For although most people talk about machine learning’s ability to predict the future, what it really does is predict the past. Gmail detects spam effectively because it knows what previous spam looked like (that is the value of training data) and assumes that today’s spam looks the same. The core assumption embedded in machine learning models is that the characteristics associated with certain outcomes in the past will lead to the same outcomes in the future.
The trouble with predicting the past is that the past can be unsavory. Data reflects the social contexts in which it was generated. A national history rife with systemic discrimination has generated data that reflects these biases: when employers prefer white job applicants to similarly qualified African American ones or men over similarly qualified women,9 and when women and African Americans have been excluded from economic and educational opportunities, the resulting data about society, taken on its face, can seem to suggest there is something fundamental about being white or male that makes a person more qualified, educated, and prosperous. In other words, uncritically relying on data drawn from an unjust society will identify the products of discrimination as neutral facts about people’s inherent characteristics.
When it comes to spam, this may not matter much—receiving junk mail is rarely more than a nuisance. But when it comes to algorithms that make more important decisions, the biases contained within training data can matter a great deal.
In the 1970s, St. George’s Hospital Medical School in London developed a computer program to help it weed out applicants. Given an applicant pool of around 2,000 for only 150 spots, a program that could cut down the work of selecting students to interview had obvious appeal. Throughout much of the 1980s, this program conducted the school’s initial review, filtering which students St. George’s should interview. In 1988, however, the U.K. Commission for Racial Equality investigated this algorithm’s use and found it to be biased: by following the computer program, St. George’s had unfairly rejected hundreds of women and minorities with sufficient academic credentials to merit an interview.10
The algorithm did not learn this bias on its own: throughout the history of St. George’s, the admissions staff had been making biased admissions decisions based on race and gender. When the admissions algorithm drew on its training data of previous admissions decisions, it inferred that St. George’s considered women and minorities to be less worthy. Instead of learning to identify the most academically qualified candidates, in other words, the algorithm learned to identify the applicants that looked the most like those the school had admitted in the past. In fact, the algorithm achieved 90 percent correlation with a human selection panel at the hospital—which is why St. George’s believed in the first place that the selection algorithm would be useful.
To this day, many have repeated the same mistake, relying on machine learning to make important decisions, only to realize that the models were making biased predictions. In 2014, for example, Amazon began developing machine learning algorithms to help it decide which job applicants to hire. Just a year later, the company abandoned the project when it discovered that the model was unfairly favoring male candidates.11
* * *
The Cambridge Police Department was not alone in believing that machine learning could improve police operations. With machine learning having attained an almost mythic status for being able to solve any problem, says the technology policy lawyer David Robinson, “people are primed for the idea that computers can cause significant improvements to whatever they’re added to.” It was only natural, he adds, that police departments across the country would wonder, “Why can’t we sprinkle some of that magic over the difficult problem of community safety in cities?”12
Companies began showering the market with tools to perform “predictive policing.” One of the most widely used is PredPol: software that, on the basis of historical crime records, analyzes how crime spreads between places and then forecasts that spatial process into the future to predict where the next crimes will occur. The company translates these predictions for police via an interactive map overlaid with red squares (covering 500 feet by 500 feet) at the predicted high-crime locations. If police spend time in those regions, the company posits, then they will be more effective at preventing crime and catching criminals.
“Vendors are happy to provide the impression that their systems will leverage technology to make things better,” explains Robinson. PredPol has aggressively shared case studies asserting the effectiveness of its software, citing “a proven track record of crime reduction in communities that have deployed PredPol.”13 As explained by Andrew Ferguson, a legal scholar and the author of The Rise of Big Data Policing, predictive policing is alluring to police departments because it provides “‘an answer’ that seems to be removed from the hot button tensions of race and the racial tension arising from all too human policing techniques.” He adds, “A black-box futuristic answer is a lot easier than trying to address generations of economic and social neglect, gang violence, and a large-scale underfunding of the educational system.”14
Thus, in the wake of growing outrage about discriminatory police practices—including numerous high-profile police killings of African Americans—and burgeoning support for systemic police reforms, predictive policing was hailed as “a brilliantly smart idea” that could “stop crime before it starts” through objective, scientific assessments. In an interview, a former police analyst who served for several years as a PredPol lobbyist declared, “It kind of sounds like science fiction, but it’s more like science fact.”15
By now, however, when we observe such faith in technology—such a clear example of tech goggles at work—we should be skeptical and raise several questions: Can this technology actually achieve its stated purpose? What values are embedded in the technological solution? What do we overlook by assuming that this issue is a technical one?
Thorough evaluations of predictive policing tools suggest that they promise far more than they can deliver. A 2016 study led by Robinson “found little evidence that today’s systems live up to their claims.” His report instead asserts, “Predictive policing is a marketing term.”16 In fact, many of the statistics touted by PredPol are cherry-picked numbers that take advantage of normal fluctuations in crime to suggest that PredPol generated significant reductions.17 As one statistician notes, this type of analysis “means nothing.”18
John Hollywood, a researcher at the RAND Corporation (a policy think tank) who has assessed numerous predictive policing tools, calls any benefits of predictive policing “incremental at best” and says that to predict specific crimes “we would need to improve the precision of our predictions by a factor of 1000.”19 Hollywood’s analysis of a predictive policing effort in Louisiana—one of the only independent analyses of predictive policing that has been conducted—found that the program had “no statistically significant impact” on crime.20
Despite their interest in using machine learning to prevent crime, even Wagner and Sevieri of the Cambridge Police Department are critical of PredPol. “It was the right product at the right time,” says Sevieri. “Police departments were looking for a quick fix.” Wagner’s primary critique is that PredPol relies on an “oversimplified” model that “doesn’t take into account the patterns” of crime. For example, PredPol assumes that the likelihood of crime in a location spikes immediately following the most recent nearby crime and then gradually decreases. Following a housebreak on Wednesday afternoon, PredPol predicts a higher likelihood of further crime on Wednesday night than on Thursday afternoon. But in reality, Wagner says, “a lot of crimes are serial crimes. There are patterns.” A trend of weekday afternoon housebreaks suggests that the next housebreak will occur on a following weekday afternoon, not at midnight on the same day.
A more pointed concern regarding predictive policing models is whether they make racially biased predictions about where crimes are likely to occur. Supporters of predictive policing assert that the software must be fair, because it relies on data and algorithms. According to Brett Goldstein, Chicago’s former chief data officer, an early predictive policing effort in Chicago “had absolutely nothing to do with race,” because the predictions were based on “multi-variable equations.”21 Los Angeles Police Commander Sean Malinowski called PredPol “objective” because it relies on data.22 Similarly, the director of Hitachi’s crime-mapping software declared that the program “doesn’t look at race. It just looks at facts.”23
But the “facts” of the matter—in this case, crime statistics—are well known to be “poor measures of true levels of crime,” writes the criminologist Carl Klockars. Because “police exercise an extraordinary degree of discretion in deciding what to report as crimes,” Klockars explains, police statistics “are reflective of the level of police agency resources dedicated to [the] detection” of particular types of crime, rather than the actual levels of crime across society.24 In other words, what appear to be facts about crime are largely facts about police activity and priorities.
For years, police have disproportionately targeted urban minority communities for surveillance and arrests, leading to decades of crime data that reflect this discriminatory treatment.25 Police predominantly patrol black neighborhoods and possess significant discretion regarding when and why to arrest someone.26 Many incidents that police never observe, act on, or even target in white communities are recorded as crimes in black neighborhoods.27
This is what makes The New Inquiry’s “White Collar Crime Early Warning System” such a wonderful piece of satire. The magazine developed a model, using similar technical approaches as predictive policing tools, that predicts where financial crimes are likely to occur.28 In Chicago, for example, whereas most crime maps show hot spots in the predominantly black and brown south and west sides, the hot spots for white-collar crime are in the central business district (“The Loop”) and the primarily white north side. That these maps—and in fact the very idea of using algorithms to proactively target financial crimes—are so striking brings to light an oft-overlooked aspect of the criminal justice system and machine learning–based reform efforts: our very selection of the crimes that ought to be aggressively monitored and enforced rests in part on racist and classist notions of social order.29
Thus, even if a machine learning algorithm is not hard-coded to exhibit racial bias, the data from which it learns reflects social and institutional biases. As we saw in chapter 3, differential rates of reporting on 311 apps could lead one to conclude that all of a city’s potholes are in the white, wealthy neighborhoods. And as we learned from the admissions algorithm at St. George’s Hospital Medical School, data from historically biased processes will generate similarly biased predictions. In this way predictive policing, while supposedly neutral, overemphasizes the criminality of black neighborhoods and intensifies the police presence around people and places that are already unfairly targeted.
Yet because this outcome is based on data, dispatching police in this manner is typically seen as an objective rather than political decision. When the recommendations of predictive policing are taken at face value, what was once a racially motivated decision by a police force to crack down in certain neighborhoods becomes an objective response based on science. Thus, writes the data ethicist Jacob Metcalf, the original “value decision” of whom to arrest “becomes naturalized through the black-box ‘objectivity’ of the algorithm.”30 Such naturalization may create a pernicious feedback loop that justifies and perpetuates systemic biases.
An analysis in Oakland by the Human Rights Data Analysis Group demonstrates how predictive policing can lead to these disparities. Although local public health estimates suggest that drug crimes are ubiquitous in Oakland, the study found that “drug arrests tend to only occur in very specific locations—the police data appear to disproportionately represent crimes committed in areas with higher populations of non-white and low-income residents.”31 The study’s authors developed an algorithm, based on PredPol’s methods, to determine what impacts predictive policing could have. They concluded that if the Oakland Police had used PredPol, “targeted policing would have been dispatched almost exclusively to lower income, minority neighborhoods.”32
But perhaps some technical mechanisms can be employed to avoid biased predictions. Jeremy Heffner is the product manager of HunchLab, a competitor of PredPol that attempts to answer the obvious question: if the biases reflected in crime data make predictive policing discriminatory, is there any way to make the models fair? “One thing that PredPol puts forward as the strength of their approach is that they’re only using crime data, so therefore there’s no possibility for bias, which just makes no sense to me,” Heffner says. “If there’s any bias in the systems, it’s because of the crime data itself.”33
Heffner has taken numerous steps to prevent these biases from appearing in HunchLab. His primary focus is limiting the use of data that is influenced by an officer’s discretion. Heffner provides the following example: “When an officer is going down the street, they are not generating new reports of robberies or homicides, but they might be generating new reports of vandalism or jaywalking.” This means that data about homicides is less likely to reflect biases—and therefore generate biased predictions—than is data about vandalism. Moreover, HunchLab incorporates numerous other factors—from the day of the week to the location of bars to even the cycle of the moon—in a “risk terrain model” intended to generate more accurate assessments of crime.
Despite Heffner’s commendable efforts to develop fair predictions, however, HunchLab highlights the limits of “smart” policing. For the issue with predictive policing is not just that the predictions may be biased—it is that predictive policing relies on traditional definitions of crime and assumes that policing represents the proper method to address it. Focusing on the models’ technical specifications (such as accuracy and bias) overlooks an even more important consideration: the policies and practices that the algorithm supports. In this way, attempts to improve social structures with mere technical enhancements subvert opportunities to critically assess and systematically reform political institutions.
For even if police are dispatched to neighborhoods in the most fair and race-neutral possible manner, their typical actions once there—suspicion, stop-and-frisks, arrests—are inextricably tied to the biased practices that predictive policing was largely designed to redress. Even HunchLab’s practice of calling its recommendations “missions” plays into dangerous policing narratives that extol a “warrior mindset,”34 consider every patrol to be a perilous assignment, and view everyone as a potential criminal. When unjust policies and practices are followed, even a superficially fair approach will have discriminatory impacts.
Consider what happened in Shreveport, Louisiana, during a predictive policing trial studied by RAND. When patrolling neighborhoods identified as high-crime, many police officers unexpectedly changed their tactics to focus on “intelligence gathering through leveraging low-level offenders and offenses.” Officers increasingly stopped people whom they observed “committing ordinance violations or otherwise acting suspiciously” in order to check their criminal records. Those whose histories contained prior convictions were arrested.35
Whether or not Shreveport’s model accurately and fairly identified where crime would occur, it generated increased police activity and suspicion in the regions of interest. Although unintended, this response is not surprising. After all, the point of predictive policing is to identify locations where crime will occur. Doing so primes police to be “hyper alert” when patrolling inside the regions and thus to treat everyone there as a potential criminal.36 And given the substantial evidence of racial bias in practices such as stop-and-frisk,37 it is not hard to imagine that the people whom police stop for committing violations or acting suspiciously will mainly be young men of color, thereby increasing both incarceration rates and conflict between police and communities.
Here we see the interplay between predictions and politics: whether or not predictive policing algorithms accurately and fairly identify high-crime locations, they do not dictate what actions to take in response. Governments choose to give responsibility for dealing with most forms of social disorder to the police. Police choose to go into these neighborhoods with heightened suspicion and a warrior mindset. Thus, the seemingly technical decisions about how to develop and use an algorithm are necessarily intertwined with the clearly political decisions about the kind of society we want to inhabit. Just as it is necessary to assess and reimagine policing generally, it is equally necessary to assess and reimagine the role of algorithms in policing. For if cities truly know where crime will occur, why not work with that community and with potential victims to improve those neighborhoods with social services?38 Why is the only response to send in police to observe the crime and punish the offenders?
* * *
Proponents of “smart” policing are typically so focused on optimizing existing practices that they are unable to answer—or even ask—questions about what should be done with the predictions that are made. Policing is not the only or the most effective way to curb crime and aid communities—in fact, as the police scholar David Bayley explains, “one of the best kept secrets of modern life” is that “[t]he police do not prevent crime.”39 For example, a 2017 study found that proactive policing “may inadvertently contribute to serious criminal activity” and “curtailing proactive policing can reduce major crime,” suggesting that one of the most common (and discriminatory) police practices does not even achieve its stated purpose of reducing crime.40
Although police possess means and powers to deter and punish certain criminal activity, they are ill-equipped to take on the full range of issues with which they are increasingly required to deal: homelessness, mental health and drug crises, isolated neighborhoods with poor education and limited job opportunities. These issues would be better addressed by alternative interventions.
“I don’t think anyone, in the abstract, has a problem with figuring out where crime is and responding to it,” says the ACLU’s John Chasnoff. “But what’s the appropriate response? The assumption is: we predicted crime here, and you send in police. But what if you used that data and sent in resources?”41
That’s what happened in Johnson County, Kansas—a county within the Kansas City Metro Region. The story starts decades ago, in 1993, when the sheriff, chief judge, and district attorney each sought funding from the county manager for new record management systems. The county manager was unwilling to pay for three separate versions of almost identical software, so he told them to come up with one record management system that could meet all of their needs.42
The county manager got his wish: the group worked together and created a single, integrated information management system that combines data about every criminal case, from booking through the conclusion of probation, in one place. In 2007, the county also integrated its human services data into the same system.
In addition, Johnson County has spent years developing policies that prioritize coordinated treatment for individuals suffering from mental illness. In 2008, it formed a cross-governmental Criminal Justice Advisory Council to assess the local criminal justice system and identify gaps in social services. One of the council’s first initiatives was hiring mental health professionals whose job was to help police respond to incidents that involve mental health issues. After this program was successfully launched in one of the county’s cities in 2011, slightly reducing jail bookings of the mentally ill and increasing referrals to services more than thirtyfold, it expanded to another city in 2013.43 Soon thereafter, every city in Johnson County had made appropriations in its budget for a qualified mental health professional to be embedded in its police department.44
In 2015, the success of these efforts in Johnson County attracted the attention of the White House. Lynn Overmann, then the senior advisor to the U.S. chief technology officer, was pulling together a select cohort of jurisdictions for a fledgling “Data-Driven Justice Initiative” and wanted Johnson County to be involved. The initiative’s goal was to use data to help address a crisis in the criminal justice system: the shocking frequency with which people with unaddressed mental illnesses are locked up in local jails for committing minor, nonviolent offenses. Two-thirds of inmates suffer from mental illness, two-thirds have a substance abuse disorder, and almost half suffer from chronic health problems.45 Jailing them costs billions of dollars every year.
A major reason why so many people with mental illness end up in jail is that most communities lack the services and coordination necessary to address this population’s multiple vulnerabilities, which can include drug addiction and homelessness. Even though many agencies devote resources to this population, they do so in piecemeal ways that fail to sufficiently aid and stabilize individuals.46 As a result, Overmann explains, “America’s largest mental health facilities are often our local jails.”47 But relying on police and jails is an inadequate approach that merely places a punitive Band-Aid on systemic and complex issues. As one county sheriff says, “These are not issues we can arrest or incarcerate our way out of.”48
Overmann has observed firsthand how communities are failing their most vulnerable residents. She began her career as a public defender in Miami, where she “saw from the inside how ill-equipped the criminal justice system is to help people with mental illness.” Overmann reports that although many of her clients suffered from mental health issues, “they lacked access to required mental health services. As a result, these clients often spent weeks or even months in jail.”49 Conditions for prisoners suffering from mental illness in Miami were so bad that in 2011 the Department of Justice declared the situation “inhumane and unconstitutional.”50 The lesson for the young Overmann was clear: “The system was broken.”51
In response to these issues, Miami revolutionized how it treats people with mental health issues. After discovering that fewer than 100 people with serious mental illness accounted for nearly $14 million in services over four years, the Miami-Dade Police Department trained police officers and 911 dispatchers to de-escalate encounters with people suffering from mental illness.52 Because of the police department’s focus on humane treatment and diversion away from jail and into social services, the local jail population fell by 40 percent, a decrease so steep that the county saved $12 million per year and was able to close an entire jail facility.53
Overmann carried her experience in Miami to the White House Office of Science and Technology, where she had the platform to help communities across the country divert low-level offenders with mental illness away from the criminal justice system and into treatment. Her goal was to proactively provide coordinated social services to those with mental health problems and criminal records before they ever came into further contact with the criminal justice system.
Overmann knew that success would be largely predicated on having accurate and functional data. In theory, determining the overlap between people who have been arrested and people who receive mental health treatment is easy: just combine datasets and see which names appear in both. But in reality, even this first step—merging datasets from separate components of local government—is incredibly challenging. The data that municipal agencies collect is typically used for internal administrative purposes (e.g., tracking building permits and dispatching ambulances) rather than analysis; each department is focused intensely on its own specific responsibilities, paying little attention to sharing datasets across departments. Each department’s records are therefore isolated in individual silos, created and maintained in whatever form best supports its particular objectives. These bureaucratic barriers to data integration create significant blind spots for agencies trying to coordinate their services: those running criminal justice systems “don’t know how many people screen positive for mental illness,” and behavioral health clinicians “never know if our clients are in jail.”54 As a result, agencies are unable to meet the needs of this vulnerable population.
The importance of integrated datasets is what made Johnson County such an attractive test-bed for the Data-Driven Justice Initiative. Because of its decisions over the past several decades, first to create a unified criminal justice information management system and then to integrate human services data, Johnson County possessed the vital information that most jurisdictions lacked.
In 2016, Johnson County partnered with the University of Chicago’s Data Science for Social Good program on an ambitious project: to identify which individuals suffering from mental health and medical issues would be arrested in the following year. With this information, Johnson County could provide proactive social services that would enable someone with mental illness to avoid coming into any further contact with the criminal justice system. The goal was not just to divert people from jail but to prevent them from ever reaching a crisis that requires diversion.
Toward this end, the team at UChicago developed a machine learning model. Johnson County’s data contained detailed records for 127,000 people. The data scientists consulted with Johnson County to develop 252 features (including age, history of criminal charges, and number of times enrolled in mental health programs in the past year) that could help predict future arrests. They also categorized everyone in the data on the basis of whether they had recently been arrested. Using this labeled training data, the team developed a predictive model to determine each person’s likelihood to be arrested in the following year.55
The algorithm identified several trends that indicate when someone with mental illness is likely to be arrested. Most notably, the highest-risk individuals had long gaps between their interactions with mental health services, suggesting that dropping out of social services prematurely greatly increases someone’s risk of coming into contact with the criminal justice system. Drawing on these insights, the model could automatically detect people who were following this trajectory and help Johnson County intervene before they fell through the cracks.
A retrospective analysis showed just how much the model could aid those in Johnson County. Among the 200 people who were identified as having the highest risk to be arrested in 2015, 102 went to jail in that year. If Johnson County had proactively reached out to this high-risk population, half of them might have been kept out of jail. The impacts of this predictive approach could be profound: preventing the bookings of these 102 people would have spared them almost eighteen years of cumulative jail time and, as an added benefit, would have saved the county about $250,000.56
Steve Yoder, a data specialist in Johnson County who worked with UChicago on the project, remembers being skeptical at first that there were so many people being jailed on a regular basis because of mental health issues. But when he started looking through the data, he was struck in particular by one person on the model’s list who had been booked into jail six times in the past six months.
“For those of us who don’t experience this on a day-to-day basis it’s just hard to imagine,” Yoder explains. But after double-checking the numbers, he recognized how severe the problem truly is. “This is real. There’s a person behind that. And, wow, there’s some crisis going on here that really needs to be addressed.”57
As Johnson County moves toward implementing this model to guide social service outreach, it is working with the University of Chicago to predict high-risk individuals on a monthly, rather than yearly, basis. Meanwhile, the Data-Driven Justice Initiative continues to expand. At the end of the Obama administration, the initiative found a new home in the Laura and John Arnold Foundation.58 Across the country, from Los Angeles to Salt Lake County to New Orleans to Hartford, more than 150 jurisdictions are combining previously disparate datasets to provide more proactive and effective social services.59
“I’m a believer in this,” proclaims Yoder. “I truly think that we haven’t even scratched the surface of the data.”
* * *
The work in Johnson County is remarkable in part because it starts from the understanding, as the county’s criminal justice coordinator Robert Sullivan puts it, that “people have all kinds of complexity in their lives. Some of their interactions with the criminal justice system are due to those conditions.” With this recognition comes the agency to make substantive change: rather than simply treating predicted outcomes as preordained, Johnson County uses predictions to inform preventive interventions that alter those outcomes. “We don’t want you to ever get to the point of having an interaction with any component of the criminal justice system,” Sullivan says. “That’s why we’re so excited about this predictive piece.”
Johnson County’s perspective stands in stark contrast to the view of the world through tech goggles: the only possible social change is to make policing more efficient by using data and algorithms. As PredPol explains on its website, its core mission is to help police departments “allocate the limited resources they do have more effectively.”60 By this logic, a smart city is one that adds technology to traditional practices in order to catch criminals and lower crime rates.
Yet creating a just city means more than merely optimizing typical police practices with efficient crime prevention in mind. For example, policing is a job with numerous, often conflicting goals—it cannot be boiled down to a number or formula. “It’s hard to measure success for a police force in a truly and meaningfully holistic fashion,” says David Robinson. “Police are trying to maintain legitimacy, they’re trying to deter crime, they’re trying to investigate crimes that happened, they’re trying to create social order without manufacturing indignity in the lives of the people that they supervise. The crime rate is a very poor substitute for having a comprehensive metric of success for a police department.”
Failure to incorporate the complexities of policing into predictive models can be disastrous. Just as an algorithm that optimizes traffic flow will overlook the needs of pedestrians, a predictive policing model that optimizes for reducing crime rates will ignore the other responsibilities of police as well as other goals for the community. And just as a spam filter that focuses on catching phishing emails will struggle to capture other types of spam, a crime forecast that focuses on drug crimes rather than white-collar crimes will unfairly place a big, fat target on minority neighborhoods. It is only by starting with a comprehensive and compassionate understanding of what factors lead to contact with the criminal justice system and what tactics can be used in response, as Johnson County did, that algorithms can truly help generate a more just city.
Instead of conceiving more holistic approaches that capture the complexity of the world, however, engineers tend to adopt visions of society that fit the simple presumptions within their models. Take Richard Berk, a professor of statistics and criminology at the University of Pennsylvania who has spent his career using data to analyze crime. Several of his projects involve helping judges decide which inmates should be released on parole by predicting who is likely to recidivate. Berk describes the task in vivid terms: “We have Darth Vaders and Luke Skywalkers, but we don’t know which is which.”61 The goal is to distinguish Vaders from Skywalkers.
Although this description helps explain how the algorithm works, it provides a stunning oversimplification of society. Have you ever met a Darth Vader? Who among us is a Luke Skywalker? The world cannot be broken down into people who want to destroy the universe and those who risk their lives to save it. Unwittingly, Berk’s chosen analogy highlights the fallacies of such simplistic thinking. As one critic writes, “Berk must not have watched the entire ‘Star Wars’ saga. Darth Vader wasn’t an unimpeachably evil individual. At one point he was an innocent little boy who grew up in some dire circumstances.”62 Rather than question why people make certain decisions or end up in particular situations—and attempt to push them toward positive outcomes—Berk presumes that people are fundamentally either good or bad, and that our task is simply to determine whom to punish. Apparently all we can do is follow the binary representations defined by the algorithm.
One of Berk’s most ambitious efforts is to predict whether newborn babies will commit a crime before turning eighteen, from information such as where that baby lives and who its parents are.63 He is starting in Norway, but if the same approach is taken in the United States, there is little doubt that a machine learning model could distinguish with reasonable accuracy between people who will be arrested and those that will not. After all, a government report estimated that of male babies born in 2001, one of out every three blacks, compared to only one out of every seventeen whites, would go to prison at some point during his life.64 Given those stark statistics, we don’t need cutting-edge algorithms to predict who will be arrested.
But just because we can predict a certain outcome does not mean we should consider that outcome to be inevitable or just. That a model could predict a baby’s future criminality reflects the vast inequalities of justice and opportunity in society, not the inherent nature of certain people. In just the last century, African Americans have, among many injustices, been excluded from government programs that provided loans for education and housing and been funneled into prisons through the war on drugs.65 The vast disparities in education, wealth, and crime that have resulted from these actions are not inevitable but socially constructed. To suggest that an algorithm can identify future criminals at birth is thus to accept the status quo as the natural and proper state of society, in effect labeling fights for equity and social justice as unnecessary.
A 2012 advertisement for IBM’s Domain Awareness System portrays a similar perspective. The commercial follows two white men—the proverbial cop and robber—driving through city streets at night. The police officer provides a voiceover that begins as follows: “I used to think my job was all about arrests. Chasing bad guys. Now I see my work differently. We analyze crime data, spot patterns, and figure out where to send patrols.” Relying on the advice of a computer in his police car, the officer reaches a convenience store just in time to thwart the would-be thief.66
Although it tells an appealing story, IBM’s ad demonstrates how predictive policing software both relies on and perpetuates simplistic notions of policing and crime. The officer’s first two statements set up the rules of society: there are “bad guys” who commit crime and police (the implied “good guys”) whose job it is to arrest them. This story presents another Luke Skywalker versus Darth Vader scene, with no backstory (for apparently none is needed) to explain how each person came to their present roles. In this way, in addition to completely exaggerating what algorithms are capable of—no system can predict crime at scale with anywhere near the level of precision depicted—IBM’s ad ignores all of the social and political dynamics that underlie crime and policing. The society portrayed in this vignette has no poverty, no segregation, no stop-and-frisk—in fact, because every character is white, it has no racial dynamics at all. We are left with a facile and pernicious conclusion: because of the presence of “bad guys,” crime is an inevitable phenomenon that can be prevented only by police who possess the necessary information.
This is the pernicious logic that the tech goggles cycle reinforces. First, we perceive policing as a purely technical problem of deploying officers to prevent crime. Rather than evaluate whether current police practices are well suited to addressing social disorder, we deploy predictive policing algorithms to slightly adjust police operations. Because tech goggles create a mirage of objectivity around data and algorithms, technological approaches like predictive policing are perceived as value-neutral responses to social problems. And in order to justify myopic models that do not—for they cannot—capture the full complexity of society, we adapt our social theories to match the world that the models depict. Police departments and courts become further entrenched in their view of the population as good or evil and of incarceration as the only response to crime.
When deployed within this framework, machine learning will be an ineffectual (at best) or counterproductive (at worst) tool for social justice. Consider once again the Cambridge Police Department’s housebreak pattern detection algorithm, which was motivated by precisely the kind of crime prevention that IBM’s ad portrays and which represents perhaps the best-case scenario for what crime-predicting software in the hands of police can look like. Better investigation and prevention of burglaries could benefit many. The CPD relied on data solely about housebreaks, information that is relatively reliably reported to and recorded by police. Furthermore, their algorithm is primarily intended for retroactive investigations and targeted pattern detection, as an explicit counter to software such as PredPol that directs police where to patrol to proactively prevent crime. “It’s absolutely the wrong approach to come into neighborhoods and stop everyone,” says Dan Wagner. “That’s been the problem in policing and problem with these tools.”
But even the CPD’s work, like every other attempt at predictive policing, suffers from a gaping divide between the problem being solved and the problem that needs solving. Owing to their focus on technology, many believe that the issues of policing stem from poor information about when and where crime will occur in the future. This is a problem that (at least in principle) new technology can solve. But as Alex Vitale argues in The End of Policing, “The problem is not police training, police diversity, or police methods. . . . The problem is policing itself.” Tracing the history of policing from its roots to the present day, Vitale concludes: “American police function, despite whatever good intentions they have, as a tool for managing deeply entrenched inequalities in a way that systematically produces injustices for the poor, socially marginal, and nonwhite.”67 Cities do not need to embrace new technology so that they can improve police capabilities—they need to fundamentally reconceptualize the roles, practices, and priorities of police.
In the hands of police, even algorithms intended for unbiased and nonpunitive purposes are likely to be warped or abused. For whatever its underlying capabilities, every technology is shaped by the people and institutions that wield it. Unless cities alter the police’s core functions and values, use by police of even the most fair and accurate algorithms is likely to enhance discriminatory and unjust outcomes.
In Chicago, for example, an algorithm conceived to reduce violence was perverted—through police control—into a tool for surveillance and criminalization. Drawing on his research regarding how gun violence clusters in social networks,68 the sociologist Andrew Papachristos urged that social service organizations identify people facing the highest risk of being shot in order to prevent future violence and mitigate its impacts. Building on these insights, the Chicago Police Department developed an algorithm to identify the people most likely to be involved in gun violence. And although the original stated intention for this “Strategic Subjects List” (SSL) was to prevent violence, it has largely been used as a surveillance tool that many believe disproportionately targets people of color.69 A RAND evaluation concluded that the SSL “does not appear to have been successful in reducing gun violence”; instead, “the individuals on the SSL were considered to be ‘persons of interest’ to the [Chicago Police Department]” and were more likely to be arrested.70 Even Papachristos has criticized this application of his research, writing in the Chicago Tribune that “one of the inherent dangers of police-led initiatives is that, at some level, any such efforts will become offender-focused.”71
Smart Enough Cities must take machine learning out of the hands of police and develop nonpunitive and rehabilitative approaches to address social disorder, along the lines of the Data-Driven Justice Initiative. Wagner advocates for precisely these transformations. “The criminal justice system ends up putting people in jail, and that’s not an effective place to treat them. They need mental health or substance abuse services,” he says.
And although Wagner is confident that algorithms have a role to play, he argues that we must think more critically about how to use them: “I think there’s value in using social networks to identify people who are at risk of being involved in a shooting, but Chicago fell flat in implementing the Strategic Subjects List to increase surveillance of suspects as opposed to truly trying to prevent that person from pulling the trigger or being shot. If they had used that same tool and better partnered with the community, it would have been very different.”
* * *
Many of the advances promised in smart cities rely on data analysis and machine learning algorithms—presented as providing universal benefits—yet these techniques are unable to transcend historical or current politics.
First, the data used to develop these models does not represent unassailable truths; instead, the data embeds information about socially produced outcomes and is shaped by reporting and collection practices. As the examples of 311 and police data indicate, often what we believe to be data about one thing (potholes and crime) is in fact data about something quite different (service requesting inclinations and police activity). Given that machine learning relies on historical data, we should be critical of what predictive algorithms actually forecast and hesitant about using them to direct municipal operations.
More fundamental than biases within data are the politics embedded within the algorithms. For although designing algorithms appears to be a technical task, the choices made can have vast social and political impacts. All too often, algorithms that promise efficiency as a neutral good reflect the priorities of existing institutions and power structures. In privileging police efficiency in reducing crime rates over alternative goals such as improving neighborhood welfare with social services, supposedly neutral models further entrench the role of police as the appropriate response to social disorder—a political decision if there ever was one. In that sense, predictive policing is likely to have discriminatory impacts not just because the algorithms may themselves be biased but also because they are deployed to grease the wheels of an already discriminatory system.
Rather than rush to adopt machine learning, we must ask: What goals should we pursue with the aid of predictive algorithms? How should we act in response to the predictions that are generated? How can we alter social and political conditions so that the problem we want to predict simply occurs at lower rates? Not every application of machine learning is inevitably biased or malicious or useless, but achieving benefits from machine learning requires that we debate—in political rather than technical terms—how to design algorithms and what they should be deployed to accomplish.
While the criminal justice system (not to mention every other aspect of municipal governance) has always involved contentious and complex political decisions, the particular danger of using technology to make these decisions is that we will misinterpret them as technical problems that do not require political deliberation. And by treating technology as the only variable, tech goggles blind us to the full possibilities to reform the policies and practices that technology purports to improve. When predictive policing gets hailed as the new and scientific approach to policing, it distracts us from the hard choices that must be made about what police should prioritize and what their role in society should be. Thus, says Andrew Ferguson, “Predictive policing systems offer a way seemingly to turn the page on past abuses, while still legitimizing existing practices.”72
As traditional practices are cloaked in the futuristic sheen of algorithms, they are made to appear more innovative and attractive than they truly are. Looking through tech goggles, we mischaracterize applying new technology to the same old practices as progress. But there are no easy technical fixes for systemic police discrimination and the debilitation of social services: more substantive reforms are required. Johnson County’s efforts were effective not because it discovered a new, infallible algorithm to optimize existing police practices, but because it developed strategies to address mental health issues, created the data infrastructure necessary to inform those interventions, and devoted sufficient resources to make the interventions effective. Unlike those who leap at the quick-fix solution promised by predictive policing, Robert Sullivan emphasizes that improving the criminal justice and mental health systems required an “incremental step-by-step process through the years.” As chapter 6 will discuss more thoroughly, social progress that appears driven by technology actually relies on precisely these sorts of long-term planning efforts and nontechnical policy reforms.
But first, the next chapter will further explore how using data and algorithms in government is primarily a political rather than technical project, examining how cities should responsibly govern these technologies to ensure that their use promotes democracy and equity.
This raises important questions about a thread that underwriters most (all?) smart programs: profitability. Because in the US, much of the criminal justice system, and prisons in particular, are owned and operated by a number of private companies: everything from security to food services is distributed to a swath of businesses. So, when prisons close due to “smart” tech like machine learning, is there push-back from the companies that will lose a lot of money?
By extension, I can imagine that the profit imperative drives a lot of the need to select particular data and particular numbers in order to construe “success” of smart projects. It doesn't work in the business's favor to highlight the data that shows they have entrenched deep-rooted political problems.
Hui Xin Ng:
Important case study
Hui Xin Ng:
It’s not about new tech, but more about strategies on how to address issues
Hui Xin Ng:
What drives social change?
Hui Xin Ng:
What exactly to alter?
Hui Xin Ng:
Simplification of the world
Hui Xin Ng:
That’s why tech can change our social fabric and our understanding of what it means to be human!
Hui Xin Ng:
prevention rather than fire fighting
Hui Xin Ng:
as a result of data integration
Hui Xin Ng:
addressing issues in a more appropriate manner
Hui Xin Ng:
cross disciplinary efforts
Hui Xin Ng:
Hui Xin Ng:
technology only reinforces existing social strcutures —- how do we circumvent them?
Hui Xin Ng:
Assumptions about what information is relevant— human intuition is incorporated too
Hui Xin Ng:
theories are important!
Hui Xin Ng:
What are the two levels of crime recognition?
Pattern general similarities
Uses knowledge to identify pattern specific similarities - I suppose, patterns within patters.
Human intuition for pattern general
crime series similarities in large dataset
Hui Xin Ng:
human computer interaction for crime investigation!