API

ls([prefix, project_id, show_hidden, …]) List contents of project datasets.
get(project_path, local_path[, project_id, …]) Copy from a project’s datasets to the local filesystem.
put(local_path, project_path[, project_id, …]) Copy from the local filesystem to a project’s datasets.
open(project_path[, mode, temp_dir]) Open a file from a project’s datasets for reading.
mv(source_path, destination_path[, …]) Move a file or directory within a project’s datasets.
cp(source_path, destination_path[, …]) Copy a file or directory within a project’s datasets.
rm(project_path[, project_id, recursive, …]) Remove a file or directory from the project directory.
etag(project_path[, project_id, object_client]) Get a unique identifier for the current version of a file.
faculty.datasets.ls(prefix='/', project_id=None, show_hidden=False, object_client=None)

List contents of project datasets.

Parameters:
prefix : str, optional

List only files in the datasets matching this prefix. Default behaviour is to list all files.

project_id : str, optional

The project to list files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

show_hidden : bool, optional

Include hidden files in the output. Defaults to False.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

Returns:
list

The list of files from the project datasets.

faculty.datasets.get(project_path, local_path, project_id=None, object_client=None)

Copy from a project’s datasets to the local filesystem.

Parameters:
project_path : str

The source path in the project datasets to copy.

local_path : str or os.PathLike

The destination path in the local filesystem.

project_id : str, optional

The project to get files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

faculty.datasets.put(local_path, project_path, project_id=None, object_client=None)

Copy from the local filesystem to a project’s datasets.

Parameters:
local_path : str or os.PathLike

The source path in the local filesystem to copy.

project_path : str

The destination path in the project directory.

project_id : str, optional

The project to put files in. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

faculty.datasets.open(project_path, mode='r', temp_dir=None, **kwargs)

Open a file from a project’s datasets for reading.

This downloads the file into a temporary directory before opening it, so if your files are very large, this function can take a long time.

Parameters:
project_path : str

The path of the file in the project’s datasets to open.

mode : str

The opening mode, either ‘r’ or ‘rb’. This is passed down to the standard python open function. Writing is currently not supported.

temp_dir : str

A directory on the local filesystem where you would like the file to be saved into temporarily. Note that on SherlockML servers, the default temporary directory can break with large files, so if your file is upwards of 2GB, it is recommended to specify temp_dir=’/project’.

faculty.datasets.mv(source_path, destination_path, project_id=None, object_client=None)

Move a file or directory within a project’s datasets.

Parameters:
source_path : str

The source path in the project datasets to move.

destination_path : str

The destination path in the project datasets.

project_id : str, optional

The project to get files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

faculty.datasets.cp(source_path, destination_path, project_id=None, recursive=False, object_client=None)

Copy a file or directory within a project’s datasets.

Parameters:
source_path : str

The source path in the project datasets to copy.

destination_path : str

The destination path in the project datasets.

project_id : str, optional

The project to get files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

recursive : bool, optional

If True, allows copying directories like a recursive copy in a filesystem. By default the action is not recursive.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

faculty.datasets.rm(project_path, project_id=None, recursive=False, object_client=None)

Remove a file or directory from the project directory.

Parameters:
project_path : str

The path in the project datasets to remove.

project_id : str, optional

The project to get files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

recursive : bool, optional

If True, allows deleting directories like a recursive delete in a filesystem. By default the action is not recursive.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

faculty.datasets.etag(project_path, project_id=None, object_client=None)

Get a unique identifier for the current version of a file.

Parameters:
project_path : str

The path in the project datasets.

project_id : str, optional

The project to get files from. You need to have access to this project for it to work. Defaults to the project set by FACULTY_PROJECT_ID in your environment.

object_client : faculty.clients.object.ObjectClient, optional

Advanced - can be used to benefit from caching in chain interactions with datasets.

Returns:
str