Automated Data Augmentation for Deflating Data Bias

Overview

Machine Learning (ML) has been widely adopted in many areas and data bias is a major concern in ML - certain elements of a dataset are more heavily weighted or represented than others. A biased dataset does not accurately represent a model’s use case, resulting in skewed outcomes, low accuracy levels, and analytical errors. High bias refers to when a model shows high inclination towards unnecessary inherited features, but the key features are usually omitted, and the generalization becomes very poor. While data augmentation is a promising solution to address data bias, it faces several practical challenges: (i) most of the existing efforts manually craft the synthesized dataset, which is time-consuming, (ii) the criteria for data augmentation heavily rely on human-expert knowledge, and (iii) prior solutions assume that users already know the existence of bias so that they can remedy it, which is usually not the case in many real-world applications. This proposal will develop effective data augmentation techniques to address the above challenges.



The above figure shows an overview of our proposed data augmentation framework. The proposed framework consists of four tasks: (i) GAN-based data augmentation, (ii) diffusion model based data augmentation, (iii) ensemble of GAN and diffusion models, and (iv) acceleration of the framework by boosting algorithms.

Members


   Faculty (PI)    Graduate Students
   Prof. Prabhat Mishra    Emma Andrews

Downloads

Stay tuned ...

Publications

Stay tuned ...

Research Sponsors

Semiconductor Research Corporation This project is funded by the Semiconductor Research Corporation (SRC). The views expressed on the site are those of the members of this project and do not necessarily represent those of the Semiconductor Research Corporation.