rhythmvertigo's comments

rhythmvertigo · on Nov 2, 2020

Iterative (DVC) | Full Time | Technical Community Manager | Remote

We are the company behind very popular open-source tools for ML workflow- DVC.org and CML.dev. We are solving the complexity of managing datasets, ML infrastructure, ML models lifecycle management. We are Hashicorp for ML.

We’re seeking a Technical Community Manager to help us sustain and grow our active, worldwide community!

Requirements:

experience in data science or open source software

AND

passion for building a community of developers, data scientists and engineers

Listing: https://weworkremotely.com/listings/iterative-technical-comm...

rhythmvertigo · on July 7, 2020

Yes, we're aiming to have another at the end of July! If you have a particular use case you're interested in, we'd love to know. We might be able to develop some materials around it.

rhythmvertigo · on July 7, 2020

Really glad to hear that, rkaplan. Let us know if we can be of any help!

rhythmvertigo · on July 7, 2020

Definitely- automatic model deployment would only work for a subset of applications. We think a lot more about having models available as candidates for deployment, which can then be inspected by domain experts on a team.

One of our motivations for building visual reports that appear like comments in a pull request is giving teams metrics & info to discuss when deciding if merge is right. That way, the automated part is training and testing, but the decision making is human (i.e., data scientists whose skills are better used interpreting models & data than running repetitive training scripts).

rhythmvertigo · on July 7, 2020

Out of curiosity, what kind of research? I'm very interested in getting DVC and Git more broadly into academic research circles, but finding a lot of barriers in my home field.

rhythmvertigo · on July 7, 2020

Thanks for the kind words! DevOps has had profound results and we think culturally and technically, it's the right moment to push for those practices in ML.

rhythmvertigo · on July 7, 2020

Thanks! Hashicorp is our inspiration :D

rhythmvertigo · on July 7, 2020

Thanks, and good questions!

1. Yes! Let me link some reports and example repos:

- A basic classification problem with scikit learn: https://github.com/iterative/cml_base_case/pull/2

- CML with DVC & Vega-Lite graphs: https://github.com/iterative/cml_dvc_case/pull/4

- Neural style transfer with EC2 GPU: https://github.com/iterative/cml_cloud_case/pull/2

2. If training fails, you'll be notified that your run failed in the GitHub Action dashboard (or GitLab CI/CD dashboard). See here for some real life examples of failure ;) : https://github.com/iterative/cml_cloud_case/actions

3. CML reports are markdown documents, so you can write any kind of text to them. If your metrics are output in a file `metrics.txt`, you can have your runner execute `cat metrics.txt >> report.md` and then have CML pass on the report to GitHub/GitLab. Likewise, any graphing library is supported because you can add standard image files (.png, .jpg) to the report. So custom metrics and custom graphs. We like DVC for managing and plotting metrics, but we're biased because we also maintain it.

4. Yep, GitHub Actions is pretty powerful and flexible. Works with whatever external services you can connect to your Action!

5. It's not strictly a Docker technology. We use Docker images preinstalled with the CML library in our examples, but you can just install the library with npm in your own image. https://github.com/iterative/cml#using-your-own-docker-image

Let me know if there's anything else I can tell you about

rhythmvertigo · on July 7, 2020

Hi, I'm one of the project creators. Continuous Machine Learning (CML) is an open source project to help ML projects use CI/CD with Github Actions and Gitlab CI (https://github.com/iterative/cml).

CML automatically generates human-readable reports with metrics and data viz in every pull/merge request, and helps you use storage and GPU/CPU resources from cloud services. CML addresses three hurdles for making ML compatible with CI:

1. In ML, pass/fail tests aren’t enough. Understanding model performance might require data visualizations and detailed metric reports. CML automatically generates custom reports after every CI run with visual elements like tables and graphs. You can even get a Tensorboard.dev link as part of your report.

2. Dataset changes need to trigger feedback just like source code. CML works with DVC so dataset changes trigger automatic training and testing.

3.Hardware for ML is an ecosystem in itself. We’ve developed use cases with CML and Docker Machine to automatically provision and deploy cloud compute instances (CPU & GPU) for model training.

Our philosophy is that ML projects- and MLOps practices- should be built on top of traditional software tools and CI systems, and not as a separate platform. Our goal is to extend DevOps’ wins from software development to ML. Check out our project site (https://cml.dev) and repo, and please let us know what you think!

doppenhe · on July 7, 2020

Deployment, inference and management can participate in this as well!

Here is the missing part for a total e2e solution: https://github.com/marketplace/actions/algorithmia-ci-cd

{disclaimer, we built this Github action}

davidortega · on July 14, 2020

Hi doppenhe, we have that part already implemented using cml-send-github-check and dvc metrics diff. You can compare the metric that you prefer with dvc and then just set the status of the github check uploading your full report. Of course, you can also fail the workflow as your Github action does, but I think is more useful to see it as a report in the check.

disclaimer: I'm work with CML

rhythmvertigo · on July 7, 2020

coooool! going to try this out :)

calebkaiser · on July 7, 2020

This is really cool. We've been recommending DVC to our users for a long time, and this looks like a natural step forward for the Iterative ecosystem.

rhythmvertigo · on July 7, 2020

Yeah, we were seeing a lot of users organically asking about CI/CD. DVC had to come first to address some of the data management issues, which seemed like the biggest hurdle to CI. But we're excited to add this now.

gravypod · on July 7, 2020

This looks really amazing. Do you have any plans to support on-prem GPUs? If we had Gitlab runners with GPUs in them would this project pick it up and use those runners for training/analysis?

davidortega · on July 14, 2020

You can also deploy runners with GPU on premise using CML docker image with GPU already supported having to install only the ndivia drivers and nvidia-docker in your machine.

docker run --name myrunner -d -e RUNNER_IDLE_TIMEOUT=1800 -e RUNNER_LABELS=cml -e RUNNER_REPO=$my_repo_url -e repo_token=$my_repo_token dvcorg/cml-gpu-py3-cloud-runner

It works for Gitlab and Github. Just only point your url and repo token

rhythmvertigo · on July 7, 2020

Yep, it will work with your GPUs! You just set up self-hosted runners as usual https://docs.gitlab.com/runner/

GitHub & GitLab have both made it quite easy to use your own resources as runners. I recently met someone who was doing Actions with a Jetson Nano on their dresser :)

gravypod · on July 7, 2020

That's really cool. I might have to play around with this. Do you have any docs on what you do to deploy a model? Something I've been doing at work is dealing with the output of some ML code we have. We end up with ~150GB of data that needs to be synced to a file share in prod. I'm assuming DVC can be used for this.

After the run, output files, upload as a data set in DVC or something?

Documenting this full workflow would save a lot of confused devopsy people (like myself) survive in the world of ML. Thanks for this hard work you've all put into this!

calebkaiser · on July 7, 2020

Not to be too self-promotional here, but I'm a maintainer of Cortex, a model deployment platform that sounds like it might be useful: https://github.com/cortexlabs/cortex

With DVC/Cortex, you can set things up so that all you have to do is run `dvc push` to update your model and `cortex deploy` to deploy it.

dmpetrov · on July 7, 2020

150Gb ML model file? That's cool!

Yes, DVC can help with that. Where the data lives? S3/GCS or just a server with SSH?

Disclaimer: I'm a creator of DVC.

gravypod · on July 7, 2020

They live in an Azure Storage Account (Azure's not-S3). They're mounted as a network storage into pods in our kube cluster. All we need to do to deploy is copy the data into `/<thing>/<timestamp>` and the code finds the newest version and loads it up. So really I'm just looking for a way to abstract azure blob storage from my ML people and to allow them to do the equivalent of `docker tag...` to choose what we roll out.

shcheklein · on July 7, 2020

I think DVC (+CML) is a good solution for this. It "wraps" artifacts that you store into Git. And Git repo abstracts access to the cloud. In your case of the mounted storage it will look like `git pull` + `dvc checkout` after model is "merged" in the production/master branch.

CML can automate and make the process of preparing the model to be merged into that branch reliable, visible, robust, etc.

I'm happy to help with this flow, ping me on Twitter - @shcheklein in DM or ivan on DVC Discord.

rhythmvertigo · on June 22, 2020

We have a lot in store here- we are unrolling a new tool for CI/CD soon that works with GitHub Actions & GitLab CI. Adding run-cache to DVC 1.0 is just one way of preparing for more CI/CD uses of DVC. (FYI I am part of DVC)