Python Integration
The Oceanum Python library includes a storage module that follows the fsspec specification, providing seamless integration with Oceanum Storage in your Python scripts and notebooks.
Installation
Section titled “Installation”Install the Oceanum library:
pip install oceanumAuthentication
Section titled “Authentication”The storage module uses your Datamesh token for authentication. You can either:
- Set the
DATAMESH_TOKENenvironment variable:
export DATAMESH_TOKEN="your-datamesh-token"- Or pass the token directly to functions:
from oceanum import storage
storage.ls("/", token="your-datamesh-token")To obtain your Datamesh token, see the Token documentation.
Simple Functions
Section titled “Simple Functions”The storage module provides simple functions for common operations:
List Files
Section titled “List Files”from oceanum import storage
# List root directoryfiles = storage.ls("/")for f in files: print(f)
# List with detailsfiles = storage.ls("/my-folder", detail=True)for f in files: print(f"{f['name']} - {f['size']} bytes")
# Recursive listingfiles = storage.ls("/my-folder", recursive=True)Upload Files
Section titled “Upload Files”from oceanum import storage
# Upload a single filestorage.put("local_file.nc", "/remote/path/file.nc")
# Upload a directory recursivelystorage.put("./local_folder", "/remote/folder", recursive=True)Download Files
Section titled “Download Files”from oceanum import storage
# Download a single filestorage.get("/remote/path/file.nc", "local_file.nc")
# Download a directory recursivelystorage.get("/remote/folder", "./local_folder", recursive=True)Delete Files
Section titled “Delete Files”from oceanum import storage
# Delete a filestorage.rm("/remote/path/old_file.nc")
# Delete a directory recursivelystorage.rm("/remote/folder", recursive=True)Check Files
Section titled “Check Files”from oceanum import storage
# Check if path existsif storage.exists("/remote/path/file.nc"): print("File exists")
# Check if path is a fileif storage.isfile("/remote/path/file.nc"): print("It's a file")
# Check if path is a directoryif storage.isdir("/remote/folder"): print("It's a directory")FileSystem Class
Section titled “FileSystem Class”For more control, use the FileSystem class directly:
from oceanum.storage import FileSystem
# Initialize with tokenfs = FileSystem(token="your-datamesh-token")
# List filesfiles = fs.ls("/my-folder")
# Get file infoinfo = fs.info("/my-folder/file.nc")print(f"Size: {info['size']}, Modified: {info['mtime']}")
# Read file contentcontent = fs.cat("/my-folder/file.txt")
# Write contentfs.pipe("/my-folder/new_file.txt", b"Hello, World!")
# Create directoryfs.mkdir("/my-folder/new-dir")
# Copy filesfs.cp("/source/file.nc", "/dest/file.nc")
# Move filesfs.mv("/old/path/file.nc", "/new/path/file.nc")
# Generate signed URL (valid for 100 seconds by default)url = fs.sign("/my-folder/file.nc", expiration=3600)print(url)Using with fsspec
Section titled “Using with fsspec”The storage filesystem integrates with fsspec, allowing use with the oceanum:// protocol:
import fsspec
# Open a file using fsspecwith fsspec.open("oceanum://my-folder/file.txt", "r", token="your-token") as f: content = f.read()
# Write a filewith fsspec.open("oceanum://my-folder/output.txt", "w", token="your-token") as f: f.write("Hello, World!")Working with xarray
Section titled “Working with xarray”Use fsspec integration to work with NetCDF and Zarr datasets:
import xarray as xr
# Open a NetCDF file from storageds = xr.open_dataset( "oceanum://data/ocean_temps.nc", engine="h5netcdf", storage_options={"token": "your-token"})
# Open a Zarr store from storageds = xr.open_zarr( "oceanum://data/large_dataset.zarr", storage_options={"token": "your-token"})
# Save to storageds.to_zarr( "oceanum://data/output.zarr", storage_options={"token": "your-token"})Working with Dask
Section titled “Working with Dask”The FileSystem class works with Dask for distributed computing:
import dask.dataframe as dd
# Read CSV files with Daskdf = dd.read_csv( "oceanum://data/*.csv", storage_options={"token": "your-token"})
# Read Parquet filesdf = dd.read_parquet( "oceanum://data/dataset.parquet", storage_options={"token": "your-token"})Using with Datamesh
Section titled “Using with Datamesh”Storage paths can be referenced in Datamesh using the oceanum:// protocol:
from oceanum.datamesh import Connector
# Connect to datameshconnector = Connector(token="your-token")
# Reference storage files in datasource connections# The oceanum:// protocol is recognized by DatameshEnvironment Variables
Section titled “Environment Variables”| Variable | Description |
|---|---|
DATAMESH_TOKEN | Your Datamesh authentication token |