JOBS.GE

Applied AI Developer - LoRA/QLoRA Fine-Tuning

All Jobs

All Ads by this Client

Printer Friendly

Ad title: Applied AI Developer - LoRA/QLoRA Fine-Tuning

Provided By: Makyo

Published: 13 October / Deadline: 13 November

Company is pleased to announce an opening for the position of Applied AI Developer (LoRA/QLoRA Fine-Tuning).

Location: Remote (anywhere)
Type: Contract/Full-Time
Hardware: 1× A100 GPU provided (40 GB or 80 GB)

Role Overview

We are looking for an Applied AI Developer experienced in parameter-efficient fine-tuning (LoRA/QLoRA) of LLMs. You will design, run, and optimize training pipelines on a single NVIDIA A100, with an emphasis on VRAM efficiency, reproducibility, and production-ready artifacts.

Your work will directly improve the adaptability of large models to domain-specific data while keeping costs and hardware requirements manageable.

Responsibilities:

** Fine-tune 7B-13B class models (LLaMA, Mistral, Gemma, etc.) with LoRA/QLoRA.
** Configure quantization-aware training (nf4/int4) and paged optimizers to minimize VRAM use.
** Apply gradient checkpointing, sequence packing, and FlashAttention to maximize throughput.
** Design reproducible training pipelines (PyTorch, Hugging Face Transformers, PEFT, bitsandbytes).
** Run experiments and ablations (different ranks, α values, sequence lengths) and document trade-offs.
** Export fine-tuned checkpoints for inference with vLLM/ExLlama2.
** Build lightweight FastAPI/Flask endpoints to serve models in production.
** Provide evaluation reports on domain performance (loss curves, F1, ROUGE, EM, etc.).

Requirements:

** Strong experience with PyTorch + Hugging Face (Transformers, PEFT, Accelerate).
** Hands-on with LoRA/QLoRA (bitsandbytes, nf4/int4 quantization).
** Deep understanding of GPU memory optimization: optimizer offload, gradient accumulation, ZeRO/FSDP basics.
** Practical knowledge of attention efficiency (FlashAttention, xFormers).
** Ability to explain and control VRAM budget during training.
** Comfort with Dockerized pipelines and Linux CLI (CUDA, NCCL, drivers).
** Familiarity with model serving (vLLM, TGI, ExLlama2) for low-latency inference.
** Solid grasp of evaluation methods for instruction-tuned models.

Nice to Have:

** Experience with RAG systems (LangChain, LlamaIndex).
** Knowledge of RLHF/DPO/ORPO pipelines.
** MLOps: CI/CD, Weights & Biases, MLflow tracking.
** Prior work in domain-specific instruction tuning (health, finance, legal, etc.).

Trial Project (Paid)

As part of the hiring process, candidates will:

** Fine-tune a 7B-9B model with QLoRA on a provided dataset.
** Deliver a LoRA checkpoint, inference API, and VRAM usage report.
** Show reproducibility with documented steps and logs.

Compensation:

** Competitive contract rate based on experience.
** Opportunity for ongoing projects if successful.

How to apply:

Send your resume and cover letter to hr@makyo.co

All Jobs

All Ads by this Client

Printer Friendly

All Job Ads on a Single Page

All Job Ads on a Single Page

All Job Ads on a Single Page