Ask HN: How do you test your ML datasets?

To all data scientists and machine learning researchers & practitioners:

- What do you do first when you get a new dataset for machine learning?

- How do you analyze your data to find relevant features?

- How do you identify data quality problems?

- Which statistical tests do you perform on the dataset?

- Which visualization techniques do you use to investigate the data?

I'm working on a library that helps people to find potential problems in datasets and ML models (which is not ready to share/publish yet), so I'd love to get some feedback on what you think are best practices for preparing and validating datasets for ML.