Release Notes

Coming up next

August, 2021

  • Model Registry

  • Experiment tracking dashboard enhancement

  • Quota management


  • Team dashboard (resource usage tracking)

  • Fine-grained permission

  • Managed Airflow / Airflow integration

  • TPU Support

  • Fractional GPU

  • Dataset versioning - dvc integration

  • Dataset, output directory NFS mount

  • Save/load files with savvihub SDK

If you want any features / supports, please contact us. ([email protected])

July 27, 2021 (v0.6.0)

Sweep (Hyperparameter Optimization)

You can find the best hyperparameters with SavviHub's new tuning techniques. Currently, we support grid search, random search, and bayesian optimization. Just choose the hyperparameters you want, and let us to optimize it. (docs)

Minor Fixes

  • T4 GPU support

  • Attach local volume to workspace

July 1, 2021 (v0.5.0)


Now you can start your development environment freely. Start your workspace and connect with JupyterLab / SSH.

Only files stored in the home directory are persistently managed when stop/resume.

Service (deleted)

Service is redesigned and changed by workspace.

Minor Fixes

  • Dataset versioning is available by dvc integration.

June 8, 2021 (v0.4.15)

Local dataset support

You can add local dataset. Local dataset will be mounted with NFS protocol in experiment.

Local dataset is only accessible in your cluster. Since SavviHub cannot have any permissions to access local files, you cannot see your dataset file list in dataset page.

Download logs / metrics

You can download your logs and metrics in experiment detail page.

Minor Fixes

  • Workspace is renamed to Organization

May 25, 2021 (v0.4.10)

CLI-driven project

Now, your source code does not have to be on GitHub to run an experiment on SavviHub. Create a CLI-driven project and run savvihub experiment run on your local terminal without git push.

$ sv experiment run
[?] Select project: cli-driven-example
> cli-driven-example
[?] Experiment message:
[?] Start command: python
[?] Please choose a cluster: [1] aws-apne2-prod1 (SavviHub)
> [1] aws-apne2-prod1 (SavviHub)
[2] on-premise-cluster (Custom)
[?] Please choose a resource: [11] (GPU(V100) x 1 / CPU 8 Cores / Memory 52GB)
[1] (CPU 2 Cores / Memory 6GB)
[2] v1.cpu-2.mem-6 (CPU 2 Cores / Memory 6GB)
[3] v1.cpu-4.mem-13 (CPU 4 Cores / Memory 13GB)
[4] v1.k80-1.mem-52 (GPU(K80) x 1 / CPU 4 Cores / Memory 52GB)
[5] v1.k80-8.mem-480 (GPU(K80) x 8 / CPU 32 Cores / Memory 480GB)
[6] v1.v100-1.mem-52 (GPU(V100) x 1 / CPU 8 Cores / Memory 52GB)
[7] v1.v100-4.mem-232 (GPU(V100) x 4 / CPU 32 Cores / Memory 232GB)
[8] v1.cpu-0.mem-1 (CPU shared / Memory 1GB)
[9] (GPU(K80) x 1 / CPU 4 Cores / Memory 52GB)
[10] (CPU 4 Cores / Memory 13GB)
> [11] (GPU(V100) x 1 / CPU 8 Cores / Memory 52GB)
[12] v1.v100-8.mem-480 (GPU(V100) x 8 / CPU 96 Cores / Memory 480GB)
[13] v1.k80-16.mem-724 (GPU(K80) x 16 / CPU 64 Cores / Memory 724GB)
[?] Please choose a kernel image: [2] savvihub/kernels:py37.full-cpu (Python 3.7 (All Packages))
[1] savvihub/kernels:py36.full-cpu (Python 3.6 (All Packages))
> [2] savvihub/kernels:py37.full-cpu (Python 3.7 (All Packages))
[3] savvihub/kernels:py36.full-cpu.jupyter (Python 3.6 (JupyterLab))
[4] savvihub/kernels:py37.full-cpu.jupyter (Python 3.7 (JupyterLab))
[5] tensorflow/tensorflow:1.14.0-py3 (Tensorflow 1.14.0)
[6] tensorflow/tensorflow:1.15.5-py3 (Tensorflow 1.15.5)
[7] tensorflow/tensorflow:2.0.4-py3 (Tensorflow 2.0.4)
[8] tensorflow/tensorflow:2.2.1-py3 (Tensorflow 2.2.1)
[9] tensorflow/tensorflow:2.3.2 (Tensorflow 2.3.2)
[10] tensorflow/tensorflow:2.4.1 (Tensorflow 2.4.1)
[11] tensorflow/tensorflow:2.3.0 (TensorFlow 2.3.0 (Tensorboard))
Upload the zipped local project
Experiment 1 is running. Check the experiment status at below link

Service edit & reproduce

Reproduce and Edit have been added to service actions. A new service can easily be created using an existing service's configuration via Reproduce.

You can modify a stopped service's name, computing resource, start command, exposed ports, environment variables and ssh key. Name and ports can be updated even when running.

Minor Fixes

May 9, 2021 (v0.4.0)


You can check your payment based on on-demand instance usage on settings/billing page. (docs)

Billing page examples

Continue training job after spot instance termination

If you use spot instance in experiment, it automatically continues after spot interruption. All you need to do is that make your experiment resist termination, such as save your checkpoint in every epoch, and start your experiment from saved checkpoint. You can check the details in here.

A100 / V100 spot instance in US West2

You can use A100 spot instances in us west2 (Oregon) region. Select default region as us-west2 in workspace create. You can only select one region per one workspace for now.

Minor Fixes

  • Save & load checkpoint in savvihub/examples.

  • Fix web terminal link broken in custom cluster.

Apr 27, 2021 (v0.3.0)

Support private docker image

You can use your own private docker registry on DockerHub & AWS ECR. Go Workspace>Settings>Integrations, and register your own credentials. (Docs)

SSH connection / terminal on web

Now you can connect to your experiment / service with terminal on web, and native SSH connection. (Docs)

Experiment termination protection

You might want to access the experiment container after it runs. Termination protection allows you to do that. If you checked the checkbox, then the experiment will go idle status after finish it. (Docs)

Enable termination protection with checkbox

Minor Fixes

  • (Fix) Log collect on various docker/kubernetes runtime configuration (e.g. RKE)

    • You can configure container log path withkubernetes.logContainerPath

    • For RKE, install helm with --set kubernetes.logContainerPath=/var/log/containers. (Ref)

  • (Fix) Remove prometheus dependency

    • SavviHub does not install prometheus on savvihub agent installation.