.. _getting-started-datasets: Getting started =============== Datasets can be accessed directly from Python. It allows you to copy files to and from the workspace. At the start of a notebook import the Faculty datasets library:: >>> import faculty.datasets as datasets List files ---------- You can list all the files in your project’s datasets with:: >>> datasets.ls() ['/', '/input/', '/input/client-data.csv', '/input/extra/', '/input/extra/file1.txt', '/input/extra/file2.txt', '/output/'] To see a subset of files, just provide a prefix:: >>> datasets.ls('/input/extra') ['/input/extra/', '/input/extra/file1.txt', '/input/extra/file2.txt'] Get files --------- Get particular files from datasets into your workspace with the get function:: >>> datasets.get('/input/client-data.csv', 'client-data.csv') >>> with open('client-data.csv') as f: >>> print(f.read()) name,email,age "Jane Smith",jane.smith@example.com,32 "John White",john.white@example.com,28 You can also get whole directories:: >>> datasets.get('/input/extra', 'extra') >>> import os >>> os.listdir('extra') ['file1.txt', 'file2.txt'] Put files --------- We can go in reverse and put a file from the workspace into datasets with the put function:: >>> datasets.put('results.csv', '/output/results.csv') >>> datasets.ls() ['/', '/input/', '/input/client-data.csv', '/input/extra/', '/input/extra/file1.txt', '/input/extra/file2.txt', '/output/', '/output/results.csv'] Again, this works with whole directories:: >>> datasets.put('figures', '/output/figures') >>> datasets.ls() ['/', '/input/', '/input/client-data.csv', '/input/extra/', '/input/extra/file1.txt', '/input/extra/file2.txt', '/output/', '/output/figures/', '/output/figures/plot.png', '/output/figures/regression.png', '/output/results.csv'] .. note:: Copying and moving large files (> 1 GB) is currently not well supported. Instead of using the `cp` and `mv` commands, consider downloading the file first, and re-uploading it to a different location within datasets. Then, remove the original file if needed.