Skip to main content

Freshman Manoj Sarathy uses machine learning to help wildlife conservation efforts

Fall is back and so is the Allen School’s Undergrad Spotlight! This month’s student feature is Bellevue, Washington native Manoj Sarathy. Even before his arrival as part of the school’s expanded Direct to Major admissions program, the freshman computer science major was using machine learning to help environmental conservationists track and organize wildlife data. He was recently featured in the Seattle Times and on King 5 News for his work supporting wolverine recovery in Washington.  

Allen School: Why did you want to study computer science, and what made you choose the Allen School?

Manoj Sarathy: Like most high school seniors, I had a lot of interests but my work on applying machine learning to the field of environmental conservation showed me that computer science can be useful in essentially any field. I decided to study at the Allen School because of the connections the school has with all the major companies that are implementing machine learning — and I wanted to stay close to home in the beautiful Pacific Northwest.

Allen School: What do you enjoy most about being an Allen School student?

MS: I find the resources available to the Allen School’s undergraduate students to be extremely valuable. For example, the career fairs that took place earlier this month were very useful. I learned more about companies hiring in the computer science field.

Allen School: What activities and interests do you have outside of your studies?

MS: I have attended a meeting for the Society for Economic Restoration and will be attending some of their work parties to restore the campus. I am also interested in finding out more about Students Expressing Environmental Dedication (SEED). I hope to continue playing squash, a racquet sport, during my free time. One of the opportunities I gave up by coming to UW was playing for a varsity squash team, but I hope I can be in some kind of squash club here. 

Allen School: Why did you become a member of the Conservation Northwest while you were in high school?

MS: I wish I could say it was purposeful, but it was honestly an accident. I learned about the organization while doing some online research regarding environmental conservation in the Pacific Northwest for an environmental science class I was taking in high school. I really liked the work they do, like building wildlife overpasses and underpasses across I-90 and reintroducing fishers, a species that belongs in the same family as wolverines and which were wiped out in the Pacific Northwest by hunters. I reached out to the organization to learn about volunteer opportunities, and one thing led to another. At one time, I even printed t-shirts at home to raise funds for them and through that effort, met with some international conservation organizations.

One of the projects I became involved in was their camera trap project. Teams would hike up to areas where wildlife may be located to set up camera traps to observe predators and prey in that area. Conservation organizations use camera traps, but then have to spend a lot of time and work to classify the images. Involvement in that project led me to the idea of using machine learning to speed up that effort. 

Allen School: Is that when you began to work with Woodland Park Zoo’s senior conservation scientist, Robert Long?

MS: While working on my camera trap model, I quickly realized that I needed actual camera trap images from different cameras and angles to make my machine learning model accurate. I started writing to researchers who use camera traps and he was one of the few to respond immediately and generously offered his images to me to train my model. Luckily, he was in Seattle, and invited me to meet him at the Woodland Park Zoo. I have been working with him ever since. 

Allen School: How did you use machine learning to classify all of the images?

MS: Any machine learning system learns from input data. The better and the more varied the input data, the more accurate the machine learning system can be. Initially, I naively tried to use images from Google to train my machine learning model. I tried to create a model that distinguishes between species. When I tested the model with actual camera trap images, I quickly learned that the system was nearly useless because most images on the internet show animals in nearly ideal conditions, like with the background out of focus. Next, I found an online database used by prior researchers called “Snapshot Serengeti,” which has thousands of images of animals from Africa. Again, I found the lack of variety in the animals and vegetation to not be very useful for the camera trap images American conservationists were collecting.

I started writing to researchers and only a couple responded. Fewer still offered to share their images with me. I also learned through my discussions with them, and based on my own experience with Conservation Northwest’s camera trap project, that just separating images containing animals or humans from other images containing only background foliage would be immensely useful because researchers spend countless time looking at each false positive to make sure they are not missing anything. Distinguishing between animals and humans would also be very helpful. So I started building a model that classifies images into three categories: false positive, human, and animal. This enables volunteers to be more productive and efficient by prioritizing images for analysis. 

Allen School: Did you know how to do all of this before you started on the project? 

MS: Before I started working on my project, I knew next to nothing about coding and machine learning. I read as much as I could about machine learning and Google’s TensorFlow. I also needed to learn some Python programming to get it to work. Over time and through lots of failures and crashes, I slowly built a decent model. I don’t claim to understand how TensorFlow or machine learning frameworks actually work, but I hope to learn more about these topics in the Allen School! 

Allen School: Do you want to remain working in conservation after you finish your CS degree? 

MS: I genuinely enjoy the natural environment we are fortunate to have here in the Pacific Northwest. So I will definitely stay involved in environmental conservation, but I haven’t yet decided in what way or how I can make the most impact. Ask me that question again when I’m a senior, I may have a better idea.

We’re so excited to have a dedicated conservationist like Manoj as a member of the Allen School community. We are confident his innovation will change the world!


Read more →

Uncle Phil, is that really you? Allen School researchers decode vulnerabilities in online genetic genealogy services

Hand holding saliva collection tube
Marco Verch/Flickr

Genetic genealogy websites enable people to upload their results from consumer DNA testing services like Ancestry.com and 23andMe to explore their genetic makeup, familial relationships, and even discover new relatives they didn’t know they had. But how can you be sure that the person who emails you claiming to be your Uncle Phil really is a long-lost relation?

Based on what a team of Allen School researchers discovered when interacting with the largest third-party genetic genealogy service, you may want to approach plans for a reunion with caution. In their paper “Genotype Extraction and False Relative Attacks: Security Risks to Third-Party Genetic Genealogy Services Beyond Identity Inference,” they analyze how security vulnerabilities built into the GEDmatch website could allow someone to construct an imaginary relative or obtain sensitive information about people who have uploaded their personal genetic data. 

Through a series of highly-controlled experiments using information from the GEDmatch online database, Allen School alumnus and current postdoctoral researcher Peter Ney (Ph.D., ‘19) and professors Tadayoshi Kohno and Luis Ceze determined that it would be relatively straightforward for an adversary to exploit vulnerabilities in the site’s application programming interface (API) that compromise users’ privacy and expose them to potential fraud. The team demonstrated multiple ways in which they could extract highly personal, potentially sensitive genetic information about individuals on the site — and use existing familial relationships to create false new ones by uploading fake profiles that indicate a genetic match where none exists.

Part of GEDmatch’s attraction is its user-friendly graphical interface, which relies on bars and color-coding to visualize specific genetic markers and similarities between two profiles. For example, the “chromosome paintings” illustrate the differences between two profiles on each chromosome, accompanied by “segment coordinates” that indicate the precise genetic markers that the profiles share. These one-to-one comparisons, however, can be used to reveal more information than intended. It was this aspect of the service that the researchers were able to exploit in their attacks. To their surprise, they were not only able to determine the presence or absence of various genetic markers at certain segments of a hypothetical user’s profile, but to reconstruct 92% of the entire profile with 98% accuracy.

As a first step, Ney and his colleagues created a research account on GEDmatch, to which they uploaded artificial genetic profiles generated from data contained in anonymous profiles from multiple, publicly available datasets designated for research use. By assigning each of their profiles a privacy setting of “research,” the team ensured that their artificial profiles would not appear in public matching results. Once the profiles were uploaded, GEDmatch automatically assigned each one a unique ID, which enabled the team to perform comparisons between a specific profile and others in the database — in this case, a set of “extraction profiles” created for this purpose. The team then performed a series of experiments. For the total profile reconstruction, they uploaded and ran comparisons between 20 extraction profiles and five targets. Based on the GEDmatch visualizations alone, they were able to recover just over 60% of the target profiles’ data. Based on their knowledge of genetics, specifically the frequency with which possible DNA bases are found within the population at a specific position on the genome, they were able to determine another 30%. They then relied on a genetic technique known as imputation to fill in the rest. 

Once they had constructed nearly the whole of a target’s profile, the researchers used that information to create a false child for one of their targets. When they ran the comparison between the target profile and the false child profile through the system, GEDmatch confirmed that the two were a match for a parent-child relationship.

While it is true that an adversary would have to have the right combination of programming skills and knowledge of genetics and genealogy to pull it off, the process isn’t as difficult as it sounds — or, to a security expert, as it should be. To acquire a person’s entire profile, Ney and his colleagues performed the comparisons between extraction and target profiles manually. They estimate the process took 10 minutes to complete — a daunting prospect, perhaps, if an adversary wanted to compare a much greater number of targets. But if one were to write a script that automatically performs the comparisons? “That would take 10 seconds,” said Ney, who is the lead author of the paper.

Consumer-facing genetic testing and genetic genealogy are still relatively nascent industries, but they are gaining in popularity. And as the size of the database grows, so does the interest of law enforcement looking to crack criminal cases for which the trail has gone cold. In one high-profile example from last year, investigators arrested a suspect alleged to be the Golden State Killer, whose identity remained elusive for more than four decades before genetic genealogy yielded a breakthrough. Given the prospect of using genetic information for this and other purposes, the researchers’ findings yield important questions about how to ensure the security and integrity of genetic genealogy results, now and into the future.

“We’re only beginning to scratch the surface,” said Kohno, who co-directs the Allen School’s Security and Privacy Research Lab and previously helped expose potential security vulnerabilities in internet-connected motor vehicles, wireless medical implants, consumer robotics, mobile advertising, and more. “The responsible thing for us is to disclose our findings so that we can engage a community of scientists and policymakers in a discussion about how to mitigate this issue.”

Echoing Kohno’s concern, Ceze emphasizes that the issue is made all the more urgent by the sensitive nature of the data that people upload to a site like GEDmatch — with broad legal, medical, and psychological ramifications — in the midst of what he refers to as “the age of oversharing information.”

“Genetic information correlates to medical conditions and potentially other deeply personal traits,” noted Ceze, who co-directs the Molecular Information Systems Laboratory at the University of Washington and specializes in computer architecture research as a member of the Allen School’s Sampa and SAMPL groups. “As more genetic information goes digital, the risks increase.”

Unfortunately for those who are not prone to oversharing, the risks extend beyond the direct users of genetic genealogy services. According to Ney, GEDmatch contains the personal genetic information of a sufficient number and variety of people across the U.S. that, should someone gain illicit possession of the entire database, they could potentially link genetic information with identity for a large portion of the country. While Ney describes the decision to share one’s data on GEDmatch as a personal one, some decisions appear to be more personal — and wider reaching — than others. And once a person’s genetic data is compromised, he notes, it is compromised forever. 

So whether or not you’ve uploaded your genetic information to GEDmatch, you might want to ask Uncle Phil for an additional form of identification before rushing to make up the guest bed. 

“People think of genetic data as being personal — and it is. It’s literally part of their physical identity,” Ney said. “You can change your credit card number, but you can’t change your DNA.”

The team will present its findings at the Network and Distributed System Security Symposium (NDSS 2020) in San Diego, California in February.

To learn more, read the UW News release here and an FAQ on security and privacy issues associated with genetic genealogy services here. Also check out related coverage by MIT Technology Review, OneZero, ZDNet, GeekWire, McClatchy, and Newsweek.

Read more →

Manaswi Saha wins Amazon Catalyst Award to develop techniques for visualizing urban accessibility at scale

Allen School Ph.D student Manaswi Saha has won an Amazon Catalyst award to support her research on “Combining computational and visualization techniques to understand urban accessibility at scale.” The award, which comes with $10,000 of funding attached, will support Saha’s dissertation research working with Allen School professor Jon Froehlich in the Makeability Lab.

“More than 30 million people have some form of disability in the U.S. Of these, half report using mobility aids. In spite of the growing need for accessible sidewalks, many cities remain inaccessible even after 25 years of the Americans with Disabilities Act regulations being in place,” Saha said. “Several cities have faced multi-million-dollar lawsuits for inaccessible sidewalks. However, there are currently no tools that can visualize and quantify this issue at scale.”

Saha’s latest endeavor is an extension of her work on Project Sidewalk, a web-based crowdsourcing tool. It gamifies the collection of data on curb ramps, obstacles and other relevant sidewalk conditions by allowing volunteers to virtually walk through online streetview imagery. The first deployment of the project in 2016 was done in Washington, D.C., where 797 online users audited 2,941 miles of streets to report on accessibility issues in the city, with subsequent deployments in Seattle and Newberg, Oregon. Saha will apply her Catalyst award towards building an interactive web visualization tool that will answer questions about accessibility for stakeholders: people with mobility disabilities, caregivers, local government officials (e.g. transportation departments), policymakers, and accessibility advocates. It will help answer questions such as: Which are the most inaccessible areas in the city? Why is my neighborhood inaccessible? Where should we prioritize for allocating resources for these repairs? 

The goals, Saha said, are to fill the informational gap between citizens and the local government in their understanding of urban accessibility, increase transparency by visualizing the current state of accessibility, and creating advocacy efforts for bringing about change.

In an 18-month deployment study of Project Sidewalk, Saha’s group collected 205,385 sidewalk accessibility labels. Pictured above is a map of the reported missing curb ramps.

“As a start, we will be utilizing data collected in D.C. from Project Sidewalk and other available data sources such as from the Department of Transportation,” Saha said. “Eventually, this work would be expanded to other cities to offer them similar support.”

Saha published “Project Sidewalk: A Web-based Crowdsourcing Tool for Collecting Sidewalk Accessibility Data at Scale” that earned a Best Paper Award at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI 2019) in May. The co-authors include Froehlich, research scientist Michael Saugstad and undergrad Aileen Zeng at UW, students Hanuma Teja Maddali, Steven Bower, Aditya Dash and Anthony Li at the University of Maryland, College Park, high school student Ryan Holland, from Montgomery Blair High School, undergrad student Sage Chen from the University of Michigan and professor Kotaro Hara from Singapore Management University.

The Amazon Catalyst program is a collaboration between the University of Washington’s CoMotion and Amazon that grants funds to faculty, staff and students to encourage innovation. The goal is to support those in the UW community working on solutions to solve real-world problems. So far, the program has helped to fund 50 UW projects. Saha has been working on a novel solution to a real-world problem in urban transportation, an area of research the Catalyst program was focused on this year.

Read the Amazon Catalyst press release here, and learn more information about Project Sidewalk here. Check out past coverage of Saha and the Project Sidewalk team’s work by UW News, KIRO7, Crosscut, and Seattle Met.

Congratulations, Manaswi!

Read more →

Researchers create smart speaker that uses white noise to monitor sleeping infants

Breath Junior
UW researchers have developed a new smart speaker skill that lets a device use white noise to both soothe sleeping babies and monitor their breathing and movement. Credit: Dennis Wise/University of Washington

Doctors, parenting magazines and parents themselves recommend using white noise to help babies fall and stay asleep. Continuous, monotonous sounds like ocean waves, raindrops on a rooftop or the rumbling noise of an airplane can lull a newborn to sleep and help him or her rest longer. It also signals to little ones that it’s time to sleep when they hear the sound.

White noise—a mixture of different pitches and sounds—can soothe fussing and boost sleep in babies. And now it can be used to monitor their motion and respiratory patterns.

Researchers at the University of Washington have developed a new smart speaker, similar to the Amazon Echo or Google Home, that uses white noise to monitor infant breathing and movements. Doing so is vital because children under the age of one are susceptible to rare and devastating sleep anomalies such as Sudden Infant Death Syndrome (SIDS), according to Allen School professor Shyam Gollakota, and his Ph.D. student Anran Wang and Dr. Jacob Sunshine in the UW School of Medicine. Respiratory failure is believed to be the main cause of SIDS.

“One of the biggest challenges new parents face is making sure their babies get enough sleep. They also want to monitor their children while they’re sleeping. With this in mind, we sought to develop a system that combines soothing white noise with the ability to unobtrusively measure an infant’s motion and breathing,” said Sunshine, who is also an adjunct professor in the Allen School.

In their paper, “Contactless Infant Monitoring using White Noise,” which they will present on Oct. 22 at the MobiCom2019 conference in Los Cabos, Mexico, the team discusses how and why they created BreathJunior, a smart speaker that plays white noise and records how the noise is reflected back to detect breathing motions of infants’ chests.

“Smart speakers are becoming more and more prevalent, and these devices already have the ability to play white noise,” said Gollakota, who is also the director of the Networks & Mobile Systems Lab. “If we could use this white noise feature as a contactless way to monitor infants’ hand and leg movements, breathing and crying, then the smart speaker becomes a device that can do it all, which is really exciting.”

The team generated novel algorithms that could help them distill the tiny motion of an infant breathing from the white noise emitted from the speakers.

Breath Junior
With this smart speaker skill, the device plays white noise and records how the noise is reflected back to detect breathing motions of infants’ tiny chests. It can track both small motions — such as the chest movement involved in breathing — and large motions — such as babies moving around in their cribs. It can also pick up the sound of a baby crying. Credit: Dennis Wise/University of Washington

“We start out by transmitting a random white noise signal. But we are generating this random signal, so we know exactly what the randomness is,” said Wang.  “That signal goes out and reflects off the baby. Then the smart speaker’s microphones get a random signal back. Because we know the original signal, we can cancel out any randomness from that and then we’re left with only information about the motion from the baby.”

Because the breathing movement in babies is so minute, it’s hard to detect the movement of the baby’s chest, so Wang said they also scan the room to pinpoint where the baby is to maximize changes in the white noise signal.

“Our algorithm takes advantage of the fact that smart speakers have an array of microphones that can be used to focus in the direction of the infant’s chest,” he said. “It starts listening for changes in a bunch of potential directions, and then continues the search toward the direction that gives the clearest signal.”

The group used a prototype of BreatheJunior on an infant simulator, which could be set at different breathing rates. When they had success with the simulator, they tested it on five children in a local hospital’s neonatal intensive care unit and the respiratory rates closely matched the rates detected by standard vital signs monitors.

Sunshine explained that infants in the NICU are more likely to have either quite high or very slow breathing rates, which is why the NICU monitors their breathing so closely. BreatheJunior was able to accurately identify the breathing rates. The babies were also connected to hospital-grade respiratory monitors.

“BreathJunior holds potential for parents who want to use white noise to help their child sleep and who also want a way to monitor their child’s breathing and motion,” said Sunshine. “It also has appeal as a tool for monitoring breathing in the subset of infants in whom home respiratory monitoring is clinically indicated, as well as in hospital environments where doctors want to use unwired respiratory monitoring.”

Sunshine said it was very important to note that the American Academy of Pediatrics recommends not using a monitor that markets itself as reducing the risk of SIDS. The research he said, makes no such claim. It uses white noise to track breathing and monitor motion. It can also let parents know if the baby is crying.

The research was funded by the National Science Foundation. Learn more about the researcher’s work by visiting their website, Sound Life Sciences, Inc. Read more about the speaker system at UW News, the Daily Mail, GeekWire, MIT Technology Review and Digital Trends.


Read more →

Allen School celebrates diversity and inclusion

Grace Hopper attendees
Allen School representatives at the Grace Hopper Celebration of Women in Computing.

As a community committed to diversity and inclusion, the Allen School celebrates and values differences in its members. Yesterday (Oct. 10), the School held its annual diversity in computing reception, a favorite event highlighting the School’s broadening participation in organizations that honor diversity in computing.

Students, faculty and staff that attended the Grace Hopper Celebration of Women in Computing  earlier in October and the ACM Richard Tapia Celebration of Diversity in Computing in late September were recognized.

The Grace Hopper Celebration, held this year in Orlando, Florida, is the world’s largest gathering of women technologists and focuses on helping women grow, learn and develop to their highest potential.

Tapia conference attendees
Allen School attendees at the Richard Tapia Celebration of Diversity in Computing.

“I loved attending Grace Hopper this year. So many of the talks were so inspiring and gave me hands-on tools to approach challenges I face as a woman in tech. It was a big confidence booster, and I had the chance to meet so many amazing women in my field,” said Amanda Baughan, a graduate student in the Allen School. “It’s inspired me to tackle more difficult problems and reach for goals I may have second-guessed my own abilities in achieving previously.”

Aishwarya Mandyam
Jodi Tims, Chair of ACM-W; Allen School student Aishwarya Mandyam; Vidya Srinivasan and Sheila Tejada, co-chairs of the Grace Hopper Celebration.

Allen School student Aishwarya Mandyam was honored at the Hopper Celebration for her work, winning second place internationally in the ACM Student Research Competition.

The Tapia Celebration, held in San Diego this year, brings together people of all backgrounds, abilities and genders to recognize, celebrate and promote diversity in computing.

“I got to learn about thriving opportunities for a diverse workforce in tech and discovered it as a great platform for me to completely embrace distinct identities of myself–a woman of color, a first-gen college student, an immigrant with all transitioning struggles bolstered–and was able to find my own ground in this highly challenging field,” said Radia Karim, a junior in the Allen School. “I was really moved by the conference’s agenda and with the extremely bold and diverse Tapia attendees who have been consistently defining their own footprints in tech rising above all odds and making the future more welcoming.”

Recognizing those in attendance at these conferences, Ed Lazowska, the Bill & Melinda Gates Chair in the Allen School, said that the two conferences highlight the School’s core values and its commitment to diversity and inclusion.

“The Allen School has been widely recognized as a leader in promoting gender diversity in computing. In addition to our strides in our student body, I want to note that over the past 9 years, our faculty has grown by 29, and 15 of these are women – an amazing record for which Hank Levy deserves a great deal of credit,” he said. “In the past few years we’ve dramatically increased the attention we devote to underrepresented minority students and students from low-income backgrounds, at both the undergraduate and graduate levels.”

The Allen School has partnered with the College of Engineering’s STARS program and the state’s AccessCSforAll; students from both were also recognized.

Kimberly Ruth
Lisa Simonyi Prize recipient Kim Ruth and incoming Allen School Director Magda Balazinska.

During the reception, Kimberly Ruth, an Allen School senior, was awarded the Lisa Simonyi Prize. The prize was established by Lisa and Charles Simonyi, for students who exemplify the commitment to excellence, leadership, and diversity to which the School aspires. Ruth is an exceptionally talented and dedicated student. She is a member of UW’s Interdisciplinary Honors program and is a dual major in computer engineering and mathematics. Not only is she engaged in research with Allen School professors Franziska Roesner and Tadayoshi Kohno in the Security & Privacy Lab, but she has also been awarded the 2018 Goldwater Scholarship and the 2017 Washington Research Foundation Fellowship. She has served as a tutor for four years in a program that teaches math and Python programming to middle and high school students and founded Go Figure, an initiative to get middle school students excited about math. Last year, she was named a “Husky 100,” an annual program that honors UW students who are impacting the University community positively.

As professor and next director of the Allen School Magdalena Balazinska noted when she presented the award to Ruth, “In a program full of remarkable students, Kim stands out.”

Thanks to the Simonyi’s for supporting diversity and excellence, and thanks to everyone who came out to celebrate the people who are making our school and our field a more welcoming destination for all. And congratulations to Kim!

For more about our efforts to advance diversity in computing, check out the Allen School’s inclusiveness statement here.
Read more →

Allen School researchers find racial bias built into hate-speech detection


Top left to right: Sap, Gabriel, Smith; bottom left to right: Card, Choi

The volume of content posted on Facebook, YouTube, Twitter and other social media platforms every moment of the day, from all over the world, is monumental. Unfortunately, some of it is biased, hate-filled language targeting members of minority groups and often prompting violent action against them. Because it is impossible for human moderators to keep up with the volume of content generated in real-time, platforms are turning to artificial intelligence and machine learning to catch toxic language and stop it quickly. Regrettably, these toxic language finding tools have been found to suppress already marginalized voices. 

“Despite the benevolent intentions of most of these efforts, there’s actually a really big racial bias problem in hate speech detection right now,” said Maarten Sap, a Ph.D. student in the Allen School. “I’m not talking about the kind of bias you find in racist tweets or other forms of hate speech against minorities, instead the kind of bias I’m talking about is the kind that leads harmless tweets to be flagged as toxic when written by a minority population.”

In their paper, “The Risk of Hate Speech Detection,” presented at the recent Association for Computational Linguistics (ACL) meeting, Sap, fellow Ph.D. student Saadia Gabriel, professors Yejin Choi and Noah Smith of the Allen School and the Allen Institute for Artificial Intelligence, and Dallas Card at Carnegie Mellon University studied two different datasets of 124,779 tweets total  that were flagged for toxic language by a machine learning tool used by Twitter. What they found was widespread evidence of racial bias in how the tool characterized content. One of the datasets showed that the tool processing the tweets mistakenly reported 46% of non-offensive tweets written in African American English (AAE)–commonly spoken by black people in the US–as offensive, versus nine percent in general American English. The other dataset reported 26% of tweets in AAE as offensive when they were not, versus five percent of the general American English.

“I wasn’t aware of the exact level of bias in Perspective API — the tool used to detect online hate speech — when searching for toxic language, but I expected to see some level of bias from previous work that examined how easily algorithms like AI chatter bots learn negative cultural stereotypes and associations,” said Gabriel. “Still, it’s always surprising and a little alarming to see how well these algorithms pick up on toxic patterns pertaining to race and gender when presented with large corpora of unfiltered data from the web.”

This matters because ignoring the social context of the language, Sap said, harms minority populations by suppressing inoffensive speech. To address the biases displayed by the tool, the group changed the annotation, or the rules of reporting the hate speech. As an experiment, the researchers took 350 AAE tweets, and enlisted Amazon Mechanical Turkers for their help. 

Gabriel explained that on Amazon Mechanical Turk, researchers can set up tasks for workers to help with something like a research project or marketing effort. There are usually instructions and a set of criteria for the workers to consider, then a number of questions. 

“Here, you can tell workers specifically if there are particular things you want them to consider when thinking about the questions, for instance the tweet source,” she said. “Once the task goes up, anyone who is registered as a worker on Amazon Mechanical Turk can answer these questions. However, you can add qualifications to restrict the workers. We specified that all workers had to originate from the US since we’re considering US cultural norms and stereotypes.”

When given the tweets without background information, the Turkers reported that 55 percent of the tweets were offensive. When given the dialect and race of the tweeters, the Turkers reported that 44 percent of the tweets were offensive. The Turkers were also asked if they found the tweets personally offensive; only 33% of the posts were reported as such. This showed the researchers that priming the annotators with the source’s race and dialect influenced the labels, and also revealed that the annotations are nonobjective. 

“Our work serves as a reminder that hate speech and toxic language is highly subjective and contextual,” said Sap. “We have to think about dialect, slang and in-group versus out-group, and we have to consider that slurs spoken by the out-group might actually be reclaimed language when spoken by the in-group.”

While the study is concerning, Gabriel believes language processing machines can be taught to look at the source in order to prevent racial biases that result in the mischaracterization of content as hate speech that could lead to already marginalized voices being deplatformed.

“It’s not that these language processing machines are inventing biases, they’re learning them from the particular beliefs and norms we spread online. I think that in the same way that being more informed and having a more empathic view about differences between peoples can help us better understand our own biases and prevent them from having negative effects on those around us, injecting these kind of deeper insights into machine learning algorithms can have a significant difference on preventing racial bias,” she said. “For this, it is important to include more nuanced perspectives and greater context when doing natural language processing tasks like toxic language detection. We need to account for in-group norms and the deep complexities of our culture and history.”

To learn more, read the research paper here and watch a video of Sap’s ACL presentation here. Also see previous coverage of the project by Vox, Forbes, TechCrunch, New Scientist, Fortune, TechCrunch and MIT Technology Review.

Read more →

Allen School’s 2019-2020 Distinguished Lecture Series will explore leading-edge innovation and real-world impact

Top left to right: Dean, Patterson, Spelke; bottom left to right: Howard, McKeon, Pereira

Mark your calendars! Another exciting season of the Allen School’s Distinguished Lecture Series kicks off on Oct. 10. During the 2019-2020 season, we will explore deep learning, domain-specific architectures, recent advances in artificial intelligence and robotics, and so much more. All lectures take place at 3:30 p.m. in the Amazon Auditorium on the ground floor of the Bill & Melinda Gates Center on the University of Washington’s Seattle campus. In addition, each lecture will be live streamed on the Allen School’s YouTube channel. 

Oct. 10: Jeff Dean, Google Senior Fellow and Senior Vice President for Google AI

Allen School alumnus Jeff Dean (Ph.D., ‘96) returns to his alma mater on Thursday, Oct. 10 to deliver a talk on “Deep Learning to Solve Challenging Problems.” Dean’s presentation will highlight recent accomplishments by Google research teams, such as the open-source TensorFlow system to rapidly train, evaluate and deploy machine learning systems, and how they relate to the National Academy of Engineering’s Grand Challenges for Engineering in the 21st Century. He will also explore how machine learning is transforming many aspects of today’s computing hardware and software systems. 

Dean, who joined Google in 1999, currently leads teams working on systems for speech recognition, computer vision, language understanding and various other machine learning tasks. During his two decades with the company, he co-designed and implemented many of Google’s most important and visible features, including multiple generations of its crawling, indexing and query serving systems as well as pieces of Google’s initial advertising and AdSense for content systems. He also helped create Google’s distributed computing infrastructure, including MapReduce, BigTable and Spanner. 

Oct. 29: David Patterson, Professor Emeritus, University of California, Berkeley; Distinguished Engineer, Google; and Vice Chair, Reduced Instruction Set Computer (RISC) Foundation

David Patterson will deliver a talk on Oct. 29 examining “Domain Specific Architectures (DSA) for Deep Neural Networks: Three Generations of Tensor Processing Units (TPUs).” His presentation will explore how the recent success of deep neural networks has inspired a resurgence in domain specific architectures to run them, partially as a result of the declaration of microprocessor performance improvement due to the ending of Moore’s Law. His talk will review Google’s first generation Tensor Processing Unit (TPUv1) and how the company built the first production DSA supercomputer for the much harder problem of training, which was deployed in 2017. 

Patterson’s work on RISC, Redundant Array of Inexpensive Disks (RAID), and Network of Workstation projects helped lead to multibillion-dollar industries. In 2017, he and RISC collaborator John Hennessy shared the Association for Computing Machinery’s A.M. Turing Award — the “Nobel Prize of computing” — for pioneering a systematic, quantitative approach to the design and evaluation of computer architectures with enduring impact on the microprocessor industry.

Nov. 14: Elizabeth Spelke, Marshall L. Berkman Professor of Psychology at Harvard University and Investigator, NSF-MIT Center for Brains, Minds and Machines

Elizabeth Spelke will deliver a lecture on Nov. 14 titled, “From Core Concepts to New Systems Knowledge.” Her lecture will center on cognitive systems in young children and the ability of the human species to gain knowledge not only through gradual learning but also through a fast and flexible learning process that appears to be unique to humans and emerges with the onset of language. Although this phase of life isn’t fully understood, Spelke is using research in psychology, neuroscience and artificial intelligence to better understand human cognitive function.

Spelke explores the sources of uniquely human cognitive capacities, including the capacity for formal mathematics, for constructing and using symbolic representations, and for developing comprehensive taxonomies of objects. Conducting behavioral research on infants and preschool children, Spelke studies the origins and development of human cognition by examining how humans develop their understanding of objects, actions, people, places, numbers and geometry. She also works with computational cognitive scientists to test computational models of infants’ cognitive capacities, and to extend her research into the field with the ultimate goal to enhance young children’s learning.

Dec. 5: Ayanna Howard, Linda J. and Mark C. Smith Professor and Chair, School of Interactive Computing at the Georgia Institute of Technology

Ayanna Howard will deliver a presentation on Dec. 5 titled “Roving for a Better World.” Her talk will focus on the role of computer scientists as responsible global citizens. She will delve into the implications of recent advances in robotics and artificial intelligence, and explain the critical importance of ensuring diversity and inclusion at all stages to reduce the risk of unconscious bias and ensuring robots are designed to be accessible to all.

Prior to joining Georgia Tech, Howard was a senior robotics researcher and deputy manager in the Office of the Chief Scientist at NASA’s Jet Propulsion Laboratory. She was first hired by NASA at the age of 27 to lead a team designing a robot for future Mars exploration missions that could “think like a human and adapt to change.” While at Georgia Tech, she has served as associate director of research for the Institute for Robotics and Intelligent Machines and chair of the robotics Ph.D. program. Business Insider named her one of the most powerful women engineers in the world in 2015 and in 2018 she was named in Forbes’ Top 50 Women in Tech. 

Jan. 16: Kathleen McKeown, Henry and Gertrude Rothschild Professor of Computer Science and Founding Director, Data Science Institute, Columbia University

Kathleen McKeown will deliver a talk on Jan.16; more details about her presentation will be posted soon on our Distinguished Lecture Series page.

McKeown’s research is in natural language processing, summarization, natural language generation and analysis of social media. In these areas, her work focuses on text summarization and generating updates on disasters over live, streaming information, generating messages about electricity usage and using reinforcement learning over usage logs to determine what kinds of messages can change behavior and the analysis of social media to detect messages about aggression and loss. While at Columbia, McKeown has served as the director of the Data Science Institute, was department chair from 1998-2003 and was the vice dean for research for the School of Engineering and Applied Science for two years. McKeown is also active internationally, having served as president, vice president and secretary-treasurer of the Association of Computational Linguistics as well as a board member and secretary of the board for the Computing Research Association. 

Feb. 27: Fernando Pereira, Vice President and Engineering Fellow, Google

Fernando Pereira will deliver the Allen School’s 2020 Taskar Memorial Lecture on Feb. 27. More details about his presentation will be posted soon on our Distinguished Lecture Series page.

Pereira leads research and development at Google in natural language understanding and machine learning. Previously, he was chair of the computer and information science department at the University of Pennsylvania, head of the machine learning and information retrieval department at AT&T Labs, and held research and management positions at Scientific Research Institute International. Pereira has produced more than 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition and logic programming. He holds several patents and is widely recognized for his contributions to sequence modeling, finite-state methods, and dependency and deductive parsing.

Be sure to check our Distinguished Lecture Series page for updates throughout the season, and please plan to join us!

Read more →

Peak performance!

Dan and Galen Weld at the summit of Buck Mountain, with Glacier Peak in the background

On Saturday, Allen School Ph.D. student Galen Weld, his twin brother Adam, his father (and Allen School professor) Dan, and his mom Margaret Rosenfeld reached the 8,528-foot summit of Buck Mountain in the Glacier Peak Wilderness. With that achievement, Galen became the youngest person to summit each of the 100 highest peaks in Washington, and Dan and Galen became the first father-son team to achieve this milestone. (Dan completed his summit of Washington’s “Top 100” in 2016.)

Galen pops the cork on Champagne that Dan brought along to celebrate the achievement

Read more →

Hao Peng wins 2019 Google Ph.D. Fellowship

Hao Peng

Hao Peng, a Ph.D. student working with Allen School professor Noah Smith, has been named a 2019 Google Ph.D. Fellow for his research in natural language processing (NLP). His research focuses on a wide variety of problems in NLP, including representation learning and structured prediction.

Peng, who is one of 54 students throughout the world to be selected for a Fellowship, aims to analyze and understand the working and decisions of deep learning models and to incorporate inductive bias into the models’ design, facilitating better learning algorithms. His research provides a better understanding of many state-of-the-art models in NLP and offers more principled ways to insert inductive bias into representation learning algorithms, making them both more computation-efficient and less data-hungry. 

“Hao has been contributing groundbreaking work in natural language processing that combines the strengths of so-called ‘deep’ representation learning with structured prediction,” said Smith. “He is creative, sets a very high bar for his work, and is a delightful collaborator.”

One of Peng’s groundbreaking contributions was his research in semi-parametric models for natural language generation and text summarization. He focused on the project last summer during an internship with the Language team at Google AI. Based on his work, Peng was asked to continue his internship part-time until the team published its findings at North American Chapter of the Association for Computational Linguistics 2019.

Another important piece of research that Peng published was “Backpropagating through Structured Argmax using a SPIGOT,” which earned a Best Paper Honorable Mention at the annual meeting of the Association for Computational Linguistics in 2018. The paper proposes structured projection of intermediate gradient optimization technique (SPIGOT), which facilitates end-to-end training of neural models with structured prediction as intermediate layers. Experiments show that SPIGOT outperforms the pipelined and non-structured baselines, providing evidence that structured bias may help to learn better NLP models. Due to its flexibility, SPIGOT is applicable to multitask learning with partial or full supervision of the intermediate tasks, or inducing latent intermediate structures, according to Peng. His collaborators on the project include Smith and Sam Thomson, formerly a Ph.D. student at Carnegie Mellon University, now at Semantic Machines.

Peng has co-authored a total of eight papers in the last three years while pursuing his Ph.D. at the Allen School. The Google Fellowship will help him extend his results providing fresh theoretical justifications and better understanding of many state-of-the-art NLP models, which could yield more principled techniques to bake inductive bias into representation learning algorithms.

Peng has worked as a research intern at Google New York and Google Seattle during his studies at the UW. Prior to that he worked as an intern at Microsoft Research Asia in Beijing, and at the University of Edinburgh.

Since 2009, the Google Ph.D. Fellowship program has recognized and supported exceptional graduate students working in core and emerging areas of computer science. Previous Allen School recipients include Joseph Redmon (2018), Tianqi Chen and Arvind Satyanarayan (2016), Aaron Parks and Kyle Rector (2015) and Robert Gens and Vincent Liu (2014). Learn more about the 2019 Google Fellowships here

Congratulations, Hao!

Read more →

“Geek of the Week” Justin Chan is using smartphones to democratize medical diagnostics

Allen School Ph.D. student Justin Chan is on a mission to put the power of medical diagnostics into people’s hands, inspired by the ubiquity of smartphones coupled with advancements in artificial intelligence. Working with collaborators in the Networks & Mobile Systems Lab led by professor Shyam Gollakota and in UW Medicine, Chan has developed a mobile system for detecting ear infections in children and a contactless AI system for detecting cardiac arrest using smart devices. He co-founded Edus Health, a University of Washington spin-out that is pursuing clearance from the U.S. Food & Drug Administration to move his research out of the lab and into people’s hands and homes. His efforts have earned coverage by Scientific American, NPR, MIT Tech Review, and more — and, most recently, a feature as GeekWire’s Geek of the Week.

The motivation for Chan’s research stems from his recognition that while smart devices can combine multiple, sophisticated sensors in a battery-powered device no bigger than a pocket, the medical industry often still relies on expensive — and large — specialized devices for diagnosing patients. “I believe that everyone should be able to own their medical data. To that end, my goal is to make medical diagnostics frugal and accessible enough that anyone with a few spare parts and DIY-know-how would be able to obtain clinical-grade accuracies in the comfort of their homes,” Chan told GeekWire. “While the reality is that many diagnostic tools in healthcare often require expensive tools and specialist expertise, I am hoping we will be able to change that.”

Chan further explored the potential for smartphones and AI to transform health care in a recent article he co-authored with Drs. Sharat Raju of UW Medicine and Eric Topol of Scripps Research that appeared in The Lancet. In that article, the authors highlighted multiple examples of research aimed at using these technologies to diagnose a range of pediatric conditions in a variety of settings.

Read the full GeekWire profile here, and check out The Lancet article here.

Way to go, Justin!

Read more →

« Newer PostsOlder Posts »