AQUA Models

Purpose-built AI for aquaculture. Open source. Apache 2.0.

Download on HuggingFaceHuggingFace Read the PaperRead the Paper

The Models

Two open-source LLMs purpose-built for aquaculture — from cloud to edge.

AQUA-7B

Flagship Model

Parameters7 Billion

Base ModelMistral-7B-Instruct-v0.3

Training Data3M+ expert-verified QA pairs

FormatSafetensors (BF16)

ReleasedJuly 4, 2025

View on HuggingFace↗Apache 2.0

AQUA-1B

Edge / Mobile Model

Parameters1 Billion

Base ModelTBD

Training DataAquaculture-focused dataset

FormatSafetensors

ReleasedJuly 29, 2025

View on HuggingFace↗Apache 2.0

Key Features

Domain expertise spanning the full aquaculture value chain — from hatchery to harvest.

Production Systems & Species

Covers ponds, cages, RAS, aquaponics, mariculture, and longlines. Practices for raising tilapia, catfish, carp, salmon, shrimp, crabs, oysters, sea bass, and more.

Genetics, Hatchery & Early Life Stage

Guides advanced breeding, gene editing, hatchery design, spawning, larval care, nursery systems, live feed, transport, egg incubation, and biosecurity.

Nutrition, Feeding & Growth

Actionable protocols for feed formulation, FCR optimization, species-specific nutritional guidance, and growth stage management.

Water Quality, Health & Disease

Protocols for temperature, oxygen, pH, ammonia, nitrite, salinity — plus structured disease management: identification, vaccination, and outbreak response.

Sustainable Aquaculture & Innovation

Eco-friendly practices in waste management, environmental impact, biodiversity, climate adaptation, and adoption of AI, automation, sensors, and drones.

Economics, Regulation & Post-Harvest

Market trends, business planning, regulation, certification, traceability, harvesting, processing, cold chain, grading, HACCP, and food safety.

Training Configuration

Fine-tuned with LoRA on Mistral-7B using 8× NVIDIA H200 GPUs.

Hyperparameter	Setting
Base Model	Mistral-7B-Instruct-v0.3
Fine-tuning Method	LoRA
Optimizer	AdamW (lr = 1e-4)
Scheduler	Cosine, 5,000 warmup steps
Epochs	2
Batch Size	2 (effective: 16)
Max Sequence Length	2,048 tokens
Precision	bf16
Hardware	8× NVIDIA H200 GPUs
Training Time	~32 hours

Training Data

Approximately 3 million real and synthetic Q&A pairs, totaling around 1 billion tokens of high-quality, domain-specific data.

View Test Dataset↗

Extension worker–farmer dialogues and field advisory logs

FAO, ICAR, NOAA, and peer-reviewed aquaculture research

Synthetic Q&A from 5,000+ aquaculture-focused topics

Climate-resilient practices, hatchery SOPs, and water quality datasets

Carefully curated to support species-specific culture methods

Disclaimer & Limitations

Domain Bias — The model may reflect inherent biases present in the aquaculture data sources and industry practices on which it was trained.

Temporal Data Limitation — Climate and environmental recommendations are based on information available up to 2024. Users should cross-check climate-related advice against the latest advisories.

Potential Hallucinations — Like all large language models, AQUA-7B may occasionally generate inaccurate or misleading responses. Always validate critical, regulatory, or high-impact decisions with a qualified aquaculture professional.

Quickstart

Get started with AQUA-7B in a few lines of Python.

python

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "KurmaAI/AQUA-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
)

prompt = "What are the most common diseases in shrimp farming?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Start building with AQUA

Download the models from HuggingFace or try AquaChat on your phone.

Download on HuggingFaceHuggingFace Try AquaChatAquaChat