Reranker inference service intended for use with the Digital Assistant. Simply hosts a reranker model using HuggingFace transformers and exposes a prediction endpoint.
make buildTo run in the project use
make runWhen running in production, use
docker volume create hf_cache # If not exists
docker run -it -p 5000:5000 -v hf_cache:/app/hf_cache --gpus all -e API_KEY=<token> ghcr.io/aidotse/reranker-inference:latestmake push