By Lisa Morgan, Program Director, Content and Community, IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (A/IS)
The world has become a very polarized place in the last decade, as reflected in modern datasets. Since data is literally the foundation of AI/S, the kind of data used to train AI/S systems determines their usefulness and value to society.
Microsoft’s Tay bot is a classic example of what can go wrong quickly at scale when the purpose of an AI instance and its training data are not aligned. Clearly, there’s no shortage of data. In fact, market research firm IDC estimates that the worldwide volume of data will reach 163 exabytes by 2025. We, the creators and users of AI, must be mindful and vigilant about the data we use to train systems.
“Without the right dataset behind it, an algorithm is practically useless,” said Eleanor “Nell” Watson, member of The IEEE Global Initiative and co-founder of EthicsNet, a non-profit that’s enabling the creation of the world’s first crowdsourced ethical dataset. “It’s not deployable for any real-world application.”
Watson would know. She started teaching post-graduate computer science at age 24 in the UK, after gathering extensive experience in the tech industry at a young age. Later she also taught business courses which inspired her to co-found a successful graffiti art company that’s now based in Hong Kong. In 2010, she started a machine vision company that captures body measurements from two pictures or a short video. The system is used for telemedicine and the creation of mass, personalized apparel.
“When I started the machine vision company, a lot of our models for distinguishing between a person and the background had to be hand-coded and meticulously programmed. There was a lot of trial and error,” said Watson.
As early as 2010, her machine vision team was using convolutional neural networks (CNN) to separate people from backgrounds captured in photo and video images. As time progressed, people began asking Watson for her opinion about the societal impacts of AI which caused her to contemplate the topic in considerable detail. It occurred to her that there was no public dataset available to teach AI what socially-acceptable behavior is, so she took the task upon herself. In addition to founding EthicsNet, she serves as vice chair of the IEEE P7001™ subcommittee on transparency of A/IS.
“An intelligent person with an IQ of 150 wouldn’t be very capable of that person had never left the house. Machines are very similar,” said Watson. I’ve been observing very interesting algorithms that could be used to train machines about human values or to observe human beings and pick up the tacit social rules that govern society – things like inverse reinforcement learning which is a very sophisticated technique. And yet, none of these are yet deployable; you can’t actually use them because there isn’t a dataset of examples of humans being nice to each other.”
Creating the Seminal Ethical Dataset
“There are basic norms which are almost universal and there are a lot of things that human beings can tolerate. Those things are important because they’re required to make a pluralistic, global society work,” said Watson. “Otherwise, the benefits of these kinds of systems will be reaped by the societies which most closely represent the creators of the technology whether it’s Silicon Valley or China.”
Essentially, AI/S need to be taught the very same social graces children are taught. That way, machines can learn to behave in ways that are socially acceptable to humans.
“There are simple rules almost everyone can agree on like it’s not nice to stare at people or one should not talk loudly in church. These types of norms are the mother of morality and ethics, ” said Watson. “Machines are acting as our ambassadors now. We don’t like when our children cuss in public or our pets chase after cars partly because it reflects badly on us. If machines are to act as ambassadors, we don’t want them to give a bad impression by proxy. To solve that, we need to teach them these basic, social graces.”
Teaching machines basic social graces starts with a dataset. The next step is the benchmarking of different approaches to determine the extent to which the AI/S instance is behaving in a well-mannered way.
“Machines will not only influence us as individuals, they will influence society,” said Watson. “They’ll change how we think in act in subtle ways which could be benign or insidious, but if we have machines that are at least well-mannered and morally upright then perhaps our society will be more likely to go in a beneficial direction over the longer term.”
The Future Is Up to Us
We all need to think about machine learning in terms of the data we’re using to train them because training data impacts outcomes. If the training data is biased, the AI/S instance will be biased. Learn more about the Autonomous Systems Validation Working today.