Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@ src/bin/dvtest.rs
.idea/*
.DS_Store
*/**/.DS_Store

dv/
solr/
236 changes: 174 additions & 62 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,114 +6,226 @@

![Build Status](https://github.com/JR-1991/rust-dataverse/actions/workflows/tests.yml/badge.svg)

**Dataverse Rust** is a client library and command-line interface (CLI) for interacting with
the [Dataverse API](https://guides.dataverse.org/en/latest/api/). This project is in active development and not yet
feature complete.
A comprehensive Rust library and command-line interface for interacting with the [Dataverse API](https://guides.dataverse.org/en/latest/api/). Build robust data repository workflows with type-safe, asynchronous operations.

## Features

Current capabilities include:

### Collection Management

- **Create**: Create a new collection within the Dataverse.
- **Delete**: Remove an existing collection.
- **Publish**: Publish a collection to make it publicly available.
- **Contents**: Retrieve the contents of a collection.

### General Information

- **Version**: Retrieve the current version of the Dataverse instance.
> **Note:** This project is under active development. While core functionality is stable, the API may evolve before the 1.0 release.

### Dataset Management
## Why Dataverse Rust?

- **Get**: Fetch details of a specific dataset.
- **Create**: Create a new dataset within a collection.
- **Edit**: Modify an existing dataset.
- **Delete**: Delete an unpublished dataset.
- **Upload**: Upload a file to a dataset.
- **Publish**: Publish a dataset to make it publicly available.
- **Link**: Link datasets to other collections.
- **🚀 High Performance** - Built with async/await using Tokio and Reqwest for efficient concurrent operations
- **🔒 Type Safety** - Leverage Rust's type system to catch errors at compile time
- **⚡ Direct Upload** - Parallel batch uploads for fast file transfers to S3-compatible storage
- **🎯 Dual Interface** - Use as a library in your Rust projects or as a standalone CLI tool
- **🔐 Secure Authentication** - Multiple auth methods including system keyring integration for credential storage
- **📦 Flexible Configuration** - JSON and YAML support for all configuration files

### File Management
## Features

- **Replace**: Replace existing files in a dataset.
- **📚 Collections** - Create, publish, and manage Dataverse collections with hierarchical organization support
- **📊 Datasets** - Full dataset lifecycle management including creation, metadata editing, versioning, publishing, linking, and deletion. Support for dataset locks and review workflows
- **📁 Files** - Upload files via standard or direct upload (with parallel batch support), replace existing files, download files and complete datasets, and manage file metadata
- **🔍 Search** - Query datasets and files across your Dataverse instance with flexible search parameters
- **🛠️ Administration** - Manage storage drivers, configure external tools, and perform administrative operations
- **ℹ️ Instance Information** - Retrieve version information and available metadata exporters from your Dataverse instance

## Installation

**Command line**
### CLI Installation

Install the command-line tool directly from the repository:

```bash
cargo install --git https://github.com/JR-1991/rust-dataverse.git
```

**Cargo.toml**
### Library Installation

Please note, this crate is not yet published on crates.io. You can add it to your `Cargo.toml` file by pointing to the
GitHub repository.
Add to your `Cargo.toml`:

```toml
[dependencies]
dataverse = { git = "https://github.com/JR-1991/rust-dataverse" }
```

> **Note:** Not yet published on crates.io. Pre-1.0 releases will be available soon.

## Usage

### Command line
### Library Usage

The library provides an async API built on `tokio` and `reqwest`. Import the prelude for common types:

```rust
use dataverse::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize client
let client = BaseClient::new(
"https://demo.dataverse.org",
Some("your-api-token")
)?;

// Get instance version
let version = info::get_version(&client).await?;
println!("Dataverse version: {}", version.data.unwrap());

// Create a dataset
let dataset_body = dataset::create::DatasetCreateBody {
// ... configure metadata
..Default::default()
};
let response = dataset::create_dataset(&client, "root", dataset_body).await?;

// Upload a file
let file = UploadFile::from("path/to/file.csv");
let identifier = Identifier::PersistentId("doi:10.5072/FK2/ABCDEF".to_string());
dataset::upload_file_to_dataset(&client, identifier, file, None, None).await?;

Ok(())
}
```

Before you can use the command line tool, you need to set the `DVCLI_URL` and `DVCLI_TOKEN` environment variables. You
can do this by adding the following lines to your `.bashrc` or `.bash_profile` file:
**Key Library Modules:**

```bash
export DVCLI_URL="https://your.dataverse.url"
export DVCLI_TOKEN="your_token_here"
```
- `dataverse::client::BaseClient` - HTTP client for API interactions
- `dataverse::native_api::collection` - Collection operations
- `dataverse::native_api::dataset` - Dataset operations
- `dataverse::native_api::file` - File operations
- `dataverse::native_api::admin` - Administrative operations
- `dataverse::search_api` - Search functionality
- `dataverse::direct_upload` - Direct upload with parallel batch support
- `dataverse::data_access` - File and dataset downloads

### CLI Usage

The CLI provides three flexible authentication methods:

The command line tool in organized in subcommands. To see a list of available subcommands, run:
#### 1. Profile-Based (Recommended)

Store credentials securely in your system keyring:

```bash
dvcli --help
# Create a profile
dvcli auth set --name production --url https://dataverse.org --token your-api-token

# Use the profile
dvcli --profile production info version
```

To see help for a specific subcommand, run:
#### 2. Environment Variables

Set environment variables for automatic authentication:

```bash
dvcli <subcommand> --help
export DVCLI_URL="https://demo.dataverse.org"
export DVCLI_TOKEN="your-api-token"

dvcli dataset meta doi:10.5072/FK2/ABC123
```

**Example**
#### 3. Interactive Mode

In this examples we will demonstrate how to retrieve the version of the Dataverse instance.
If neither profile nor environment variables are set, the CLI will prompt for credentials:

```bash
dvcli info version
# Prompts for URL and token
```

The output will be similar to:
**Common CLI Operations:**

> **Note:** Configuration files can be provided in both JSON and YAML formats.

```bash
Calling: http://localhost:8080/api/info/version
└── 🎉 Success! - Received the following response:
# Get help
dvcli --help
dvcli dataset --help

{
"version": "6.2"
}
# Collections
dvcli collection create --parent root --body collection.json
dvcli collection publish my-collection

# Datasets
dvcli dataset create --collection root --body dataset.json # or dataset.yaml
dvcli dataset upload --id doi:10.5072/FK2/ABC123 data.csv
dvcli dataset publish doi:10.5072/FK2/ABC123

# Direct upload (faster for large files)
dvcli dataset direct-upload --id doi:10.5072/FK2/ABC123 --parallel 5 file1.csv file2.csv

# Files
dvcli file replace --id 12345 --path new-file.csv
dvcli file download file-pid.txt --path ./downloads/

# Search
dvcli search -q "climate change" -t dataset -t file

# Admin
dvcli admin storage-drivers
dvcli admin add-external-tool tool-manifest.json
```

## Examples

We have provided an example in the `examples` directory. These examples demonstrate how to use the client to perform
various operations.
Complete workflow examples are available in the [`examples/`](examples/) directory:

- **[create-upload-publish](examples/create-upload-publish)** - End-to-end workflow demonstrating collection and dataset creation, file upload, and publishing using shell scripts and the CLI.

Besides these examples, you can also find some recipes in the [Dataverse Recipes](https://github.com/gdcc/dataverse-recipes/tree/main/dvcli) repository, which cover most of the functionality of the CLI.

## Development

### Running Tests

Tests require a running Dataverse instance. We provide a convenient test script that handles infrastructure setup:

```bash
# Run all tests (starts Docker containers automatically)
./run-tests.sh

# Run a specific test
./run-tests.sh test_create_dataset
```

The script automatically:

- Starts Dataverse with PostgreSQL and Solr via Docker Compose
- Waits for services to be ready
- Configures environment variables
- Executes the test suite

Docker containers remain running after tests complete for faster subsequent runs. View logs with `docker logs dataverse` if you encounter issues.

### Manual Test Setup

For granular control during development:

```bash
# Start infrastructure
docker compose -f ./docker/docker-compose-base.yml --env-file local-test.env up -d

# Configure environment
export BASE_URL=http://localhost:8080
export DV_VERSION=6.2
export $(grep "API_TOKEN" "dv/bootstrap.exposed.env")
export API_TOKEN_SUPERUSER=$API_TOKEN

# Run tests
cargo test
cargo test -- --nocapture # with output
cargo test test_name # specific test
cargo test collection:: # module tests
```

## Contributing

Contributions are welcome! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated. Please feel free to open issues or submit pull requests on [GitHub](https://github.com/JR-1991/rust-dataverse).

## Community

* [`examples/create-upload-publish`](examples/create-upload-publish) - Demonstrates how to create a collection, dataset,
upload a file, and publish the collection and dataset.
Join the conversation on the [Dataverse Zulip Channel](https://dataverse.zulipchat.com)! Connect with other developers, get help, share ideas, and discuss the future of Rust clients for Dataverse.

## ToDo's
## License

- [ ] Implement remaining API endpoints
- [x] Write unit and integration tests
- [x] Asynchronous support using `tokio`
- [x] Documentation
- [ ] Publish on crates.io
- [x] Continuous integration
- [ ] Validate before upload using `/api/dataverses/$ID/validateDatasetJson`
This project is licensed under the MIT License - see the [License.md](License.md) file for details.
7 changes: 7 additions & 0 deletions conf/localstack/init-s3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

# Create the mybucket bucket for Dataverse S3 storage
awslocal s3 mb s3://mybucket

echo "S3 bucket 'mybucket' created successfully"

Loading