AQUA Models
Purpose-built AI for aquaculture. Open source. Apache 2.0.
The Models
Two open-source LLMs purpose-built for aquaculture — from cloud to edge.
AQUA-7B
Flagship Model
AQUA-1B
Edge / Mobile Model
Key Features
Domain expertise spanning the full aquaculture value chain — from hatchery to harvest.
Production Systems & Species
Covers ponds, cages, RAS, aquaponics, mariculture, and longlines. Practices for raising tilapia, catfish, carp, salmon, shrimp, crabs, oysters, sea bass, and more.
Genetics, Hatchery & Early Life Stage
Guides advanced breeding, gene editing, hatchery design, spawning, larval care, nursery systems, live feed, transport, egg incubation, and biosecurity.
Nutrition, Feeding & Growth
Actionable protocols for feed formulation, FCR optimization, species-specific nutritional guidance, and growth stage management.
Water Quality, Health & Disease
Protocols for temperature, oxygen, pH, ammonia, nitrite, salinity — plus structured disease management: identification, vaccination, and outbreak response.
Sustainable Aquaculture & Innovation
Eco-friendly practices in waste management, environmental impact, biodiversity, climate adaptation, and adoption of AI, automation, sensors, and drones.
Economics, Regulation & Post-Harvest
Market trends, business planning, regulation, certification, traceability, harvesting, processing, cold chain, grading, HACCP, and food safety.
Training Configuration
Fine-tuned with LoRA on Mistral-7B using 8× NVIDIA H200 GPUs.
| Hyperparameter | Setting |
|---|---|
| Base Model | Mistral-7B-Instruct-v0.3 |
| Fine-tuning Method | LoRA |
| Optimizer | AdamW (lr = 1e-4) |
| Scheduler | Cosine, 5,000 warmup steps |
| Epochs | 2 |
| Batch Size | 2 (effective: 16) |
| Max Sequence Length | 2,048 tokens |
| Precision | bf16 |
| Hardware | 8× NVIDIA H200 GPUs |
| Training Time | ~32 hours |
Training Data
Approximately 3 million real and synthetic Q&A pairs, totaling around 1 billion tokens of high-quality, domain-specific data.
Disclaimer & Limitations
Domain Bias — The model may reflect inherent biases present in the aquaculture data sources and industry practices on which it was trained.
Temporal Data Limitation — Climate and environmental recommendations are based on information available up to 2024. Users should cross-check climate-related advice against the latest advisories.
Potential Hallucinations — Like all large language models, AQUA-7B may occasionally generate inaccurate or misleading responses. Always validate critical, regulatory, or high-impact decisions with a qualified aquaculture professional.
Quickstart
Get started with AQUA-7B in a few lines of Python.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "KurmaAI/AQUA-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
)
prompt = "What are the most common diseases in shrimp farming?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Start building with AQUA
Download the models from HuggingFace or try AquaChat on your phone.