A major risk in building data-driven applications, including those that use machine learning methods, is susceptibility to biases in data. Even the best models can fail if the distribution of out-of-sample data differs significantly from the distribution of the data that was used to train or validate the model. Also, machine learning algorithms trained on biased data can sometimes amplify those biases. To address this problem, we are developing a set of scalable software products and companion training material to help those using federal data systematically identify sources of biased training data.