Video
Chariot Datasets Overview
With Striveworks' Chariot, users can easily create and catalog custom and sliceable datasets, speeding up data preparation for rapid model development.
Supercharged Dataset Prep in 2 Minutes with Chariot
Transcript
Creating and managing quality data sets for machine learning, model training, and evaluation can be a real pain.
So, here we'll take a look at the computer vision example. And, so, in this case, Chariot stores and versions, not just the underlying images, but also the annotations and allows you to, on platform, adjust annotations and create new ones, as necessary.
Another important thing with the dataset service is it gives you a nice history of exactly what went into that dataset, when new annotations were added, and so forth.
So, besides just the percentages you want in train, test, and val, you can also filter by, say, task type, specific labels you want to include in that split, or maybe you just want to restrict to data that was captured in a certain time period. Chariot gives you all those options to really nail down on the exact data you want to use for model development.
After creating a view, you create a snapshot which fixes a point in time with the data. This is very important because, in the real world, datasets are not static. You're constantly getting new data, and as you get new data, you need to add some of it to the train, split, some to the val, and so forth. And you want to do that in a way to make sure that you have no data leakage.
Video Summary
Struggling with messy dataset prep for machine learning? In this quick 2-minute walkthrough, Eric Korman, Chief Science Officer at Striveworks, shows how Chariot simplifies the process from end to end—from custom dataset creation to train/test/val splits—all while keeping your data versioned and annotated.
Whether you're building computer vision models or managing complex ML pipelines, Chariot gives data scientists the tools to move faster with confidence.
Out-of-the-box versioning lets you experiment quickly while keeping a clear audit trail for compliance.
Easily find and reuse datasets by searching and filtering the metadata. Learn more.