About Me

What Is New in the Way of AI and Machine Learning?



Dr. Timnit Gebru was the co-lead of Google’s Ethical AI research team—until she raised concerns about bias in the company’s large language models and was forced out in 2020.

Her departure sent shockwaves through the artificial intelligence and machine learning community and raised questions about how to deal with potential biases in systems that use massive amounts of data. Gebru went so far as to say, “I thought my career would be done on this side of things, but I was wrong.” She told a Wired story that highlights her frustration at what Gartner had called an ethical dilemma caused by “black box” or opaque design processes. That problem was resolved last year with a new report from researchers at University College London entitled Why is Your Model Black Boxy?: How Transparency Alters Its Interpretations of Images. This paper argues for transparency and accountability of models and other AI, and highlights its flaws as well. It also suggests steps for improving transparency.

Gebru, now director of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), has published a wide body of work focusing on privacy, bias, diversity, and equity in ML. These are issues that have been addressed over decades, along with many more that we haven’t yet known about and may never know. The field is still quite young, and there are many unanswered questions that have not even been explored. Many people assume that these issues are problems to do with algorithmic design. We already know the answer, which is pretty clear. Yet it’s not clear which components of the entire system need to change. So what can we learn from each other if we want better transparency, more transparency, and more accuracy?

The Current State of Open Source Privacy Analysis Frameworks


In 2014, CSAIL released DataRobot. This framework was created in part because of efforts from the Association for Computing Machinery’s group working on open source software that could improve privacy analysis frameworks. An important point was that the most common issue was lack of documentation, but few people talked about technical aspects of the problem. Some people pointed to problems with the language model they were using, others thought that poor governance was another reason for problems. But then Google and Microsoft started experimenting with their own products, one after another, and soon other companies did too. They found lots of bugs in different parts of their proprietary platforms, but nobody cared. After years of experimentation, everyone figured out how to fix these bugs. When Google shut down its project, some people blamed Facebook. Others said it didn’t matter that the code wasn’t available; instead, as someone said at a 2018 conference, it was all about transparency.

Then came news of a major privacy breach at Twitter. Shortly afterward, the world got a glimpse at what Facebook knew and when. A few weeks later, Instagram announced they would offer end users access to user posts, photos, and messages and ask them to approve of any content shared between friends in exchange for compensation. As public attention turned to social media platforms, it became increasingly apparent that a problem existed. Then, shortly before the Uyghur genocide at home soil in China—in which millions of people were forcibly disappeared from their homes, relatives were taken away without due process, and members of Muslim minority groups were persecuted and imprisoned—Google made headlines for its decision to share search queries with Chinese authorities when asked. There was little discussion of the impact of those decisions on freedom of expression, safety online, or privacy protections for people in repressive regimes.

This kind of thing happens all the time. Companies make huge bets in technology, only to find ways to get around certain regulations and laws to keep growing profits. Sometimes that makes it possible to pursue unethical practices. Other times it doesn’t. For example, while Apple products routinely infringe on copyright laws, it’s always been business as usual. Similarly, just since 2017, Google’s DeepMind AlphaGo program won an astonishing nine games against former top players Lee Sedol and Fan Hui. No one will ever know why, but many believe that winning this game would mean that AlphaGo became a threat to human rights and freedom of expression. Even Google admitted during AlphaGo’s release that winning that game was a step in AI development that would hopefully lead to greater applications and improvements to human lives.

While Google and IBM may have gotten caught up in playing catch-up to Chinese censorships, governments, and private corporations, other countries—particularly countries like the United States, where our political leaders routinely ignore data privacy concerns—continue to aggressively collect data that would give those nations an edge over us based on ethnicity alone. For instance, according to reports, nearly 2.6 billion US residents are subject to surveillance via facial recognition technology as of 2019. Most don’t realize that it violates their privacy or that the technology collects data and shares it with commercial entities. Those systems are used for everything from recognizing potential terrorists to sorting people into detention centers. All of it builds on the assumption that race, ethnicity, gender, and sexual orientation are somehow linked to our actions. But no study of consumer behavior has shown that this is true.

It’s easy to see what’s going on here. If a nation takes action to limit citizens from speaking out, protesting, or seeking justice, the government will often turn around and take their place. One might call it a Catch-22. On the flip side, if citizens demand fair treatment they risk being prosecuted if their voices aren’t heard. Since we want more openness and transparency, governments who are more willing to let data flow must do so. To date, though, there hasn’t been much progress in implementing legislation and policies to address privacy as a trade-off between free speech and privacy. Governments have tried to regulate information technology, but it’s complicated and expensive. Citizens can try to advocate for changes, but it will take significant effort and money to accomplish. Politicians have rarely moved toward solutions that allow both freedom of expression and privacy protections. Instead, politicians have pushed policy to favor either approach and to the detriment of democracy.

We live in a time of global conflict. While our local communities and individuals may hold their ground, we have become isolated from others. Bigger conflicts in the future may require deeper collaboration as humans seek to protect themselves and each other. At least to begin with. There are opportunities for states and businesses to build trust and respect among their populations and provide services. But we also need to ensure that the data we harvest is safe. Otherwise, those same technologies can cause harm to individuals and communities.

What Can Change About Privacy Protection With Models Beyond Language Models

There’s also great value in exploring how more advanced AI models look at privacy. Last month, Netflix revealed that they had spent $2 billion building 3D animation models that could be put behind cameras, allowing viewers to interact with movies and TV shows in real life. While that sounds cool, it’s not really new. Earlier, Disney was able to run computer simulations that resulted in thousands of scenes depicting violence with varying degrees of realism. Now we have 3D graphics models for VR and AR headsets that resemble the way 3D scans render objects. Researchers are trying to figure out how far such simulation techniques can go in terms of privacy protection.

Researchers have come up with various architectures for analyzing visual input, and even began looking at generative adversarial networks. Generative algorithms are ones that draw conclusions about outputs without explicitly telling you what you want to happen to it. A classic example is spam filtering, where computers determine whether a message should reach your inbox or not by looking at previous examples, similar to how we might detect text that has been previously seen of a particular person or product. From the start, it seems obvious that such a technique is problematic. Perhaps the greatest challenge for understanding privacy, however, is figuring out how to stop machines from training on samples generated by humans, rather than data collected for other reasons.

Some people argue that an algorithm doesn’t need to understand what you are asking for; instead, your intentions are enough to tell the difference. Another question is whether the algorithm can be trusted to deliver accurate predictions, especially when it’s only testing on limited test datasets. Could it even accurately evaluate complex scenarios like a novel drug or device on trial? Does the trained model need to be built from scratch every time it performs new tasks? What about the ethics inherent within these kinds of systems? Do researchers need to ask themselves, if they create a model of an actual person’s mind, as in this scene from Blade Runner 2049, a question of fairness? Or does that same moral concern apply to an algorithm that makes judgments about outcomes, like in predictive policing?

There are lots of ways to improve transparency, including giving developers access to their code. For example, anyone interested in creating deep learning models can sign up for Nvidia’s GPU Cloud platform, which supports hundreds of high-performance computing units. Anyone else can upload their code with CUDA, a popular programming language designed specifically for scientific computing. By taking advantage of NVIDIA technologies for academic purposes, researchers are able to train their models much faster. Also, scientists have options for deploying neural networks locally over cloud infrastructure like AWS TFS, GPUs, or MXNet. Still, many people don’t know how to do such deployments.

Even though it may seem less attractive, having

Post a Comment

0 Comments