AQUA Models

Purpose-built AI for aquaculture. Open source. Apache 2.0.

The Models

Two open-source LLMs purpose-built for aquaculture — from cloud to edge.

AQUA-7B

Flagship Model

Parameters7 Billion
Base ModelMistral-7B-Instruct-v0.3
Training Data3M+ expert-verified QA pairs
FormatSafetensors (BF16)
ReleasedJuly 4, 2025

AQUA-1B

Edge / Mobile Model

Parameters1 Billion
Base ModelTBD
Training DataAquaculture-focused dataset
FormatSafetensors
ReleasedJuly 29, 2025

Key Features

Domain expertise spanning the full aquaculture value chain — from hatchery to harvest.

Production Systems & Species

Covers ponds, cages, RAS, aquaponics, mariculture, and longlines. Practices for raising tilapia, catfish, carp, salmon, shrimp, crabs, oysters, sea bass, and more.

Genetics, Hatchery & Early Life Stage

Guides advanced breeding, gene editing, hatchery design, spawning, larval care, nursery systems, live feed, transport, egg incubation, and biosecurity.

Nutrition, Feeding & Growth

Actionable protocols for feed formulation, FCR optimization, species-specific nutritional guidance, and growth stage management.

Water Quality, Health & Disease

Protocols for temperature, oxygen, pH, ammonia, nitrite, salinity — plus structured disease management: identification, vaccination, and outbreak response.

Sustainable Aquaculture & Innovation

Eco-friendly practices in waste management, environmental impact, biodiversity, climate adaptation, and adoption of AI, automation, sensors, and drones.

Economics, Regulation & Post-Harvest

Market trends, business planning, regulation, certification, traceability, harvesting, processing, cold chain, grading, HACCP, and food safety.

Training Configuration

Fine-tuned with LoRA on Mistral-7B using 8× NVIDIA H200 GPUs.

HyperparameterSetting
Base ModelMistral-7B-Instruct-v0.3
Fine-tuning MethodLoRA
OptimizerAdamW (lr = 1e-4)
SchedulerCosine, 5,000 warmup steps
Epochs2
Batch Size2 (effective: 16)
Max Sequence Length2,048 tokens
Precisionbf16
Hardware8× NVIDIA H200 GPUs
Training Time~32 hours

Training Data

Approximately 3 million real and synthetic Q&A pairs, totaling around 1 billion tokens of high-quality, domain-specific data.

Extension worker–farmer dialogues and field advisory logs
FAO, ICAR, NOAA, and peer-reviewed aquaculture research
Synthetic Q&A from 5,000+ aquaculture-focused topics
Climate-resilient practices, hatchery SOPs, and water quality datasets
Carefully curated to support species-specific culture methods

Disclaimer & Limitations

Domain BiasThe model may reflect inherent biases present in the aquaculture data sources and industry practices on which it was trained.

Temporal Data LimitationClimate and environmental recommendations are based on information available up to 2024. Users should cross-check climate-related advice against the latest advisories.

Potential HallucinationsLike all large language models, AQUA-7B may occasionally generate inaccurate or misleading responses. Always validate critical, regulatory, or high-impact decisions with a qualified aquaculture professional.

Quickstart

Get started with AQUA-7B in a few lines of Python.

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "KurmaAI/AQUA-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
)

prompt = "What are the most common diseases in shrimp farming?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Start building with AQUA

Download the models from HuggingFace or try AquaChat on your phone.