Introducing AQUA-7B & AQUA-1B: The First LLMs for Aquaculture

Today we're releasing AQUA-7B and AQUA-1B on HuggingFace — the world's first language models purpose-built for aquaculture and fisheries.

The Models

AQUA-7B — Flagship Model

Parameters: 7 billion
Base Model: Mistral-7B-Instruct-v0.3
Training Data: 3M+ expert-verified QA pairs
Training Tokens: ~1 billion
License: Apache 2.0
Format: Safetensors (BF16)
Download: huggingface.co/KurmaAI/AQUA-7B

AQUA-1B — Edge / Mobile Model

Parameters: 1 billion
License: Apache 2.0
Format: Safetensors
Download: huggingface.co/KurmaAI/AQUA-1B

Why Domain-Specific Models?

General-purpose LLMs like GPT-4, LLaMA, and Qwen are impressive, but they fall short on specialized aquaculture knowledge. Our benchmarks show that AQUA-7B outperforms LLaMA 3.1 8B and Qwen 2.5 7B by 30-40% on aquaculture domain tasks.

This matters because aquaculture is a $300B+ global industry that feeds billions of people. Accurate, reliable AI guidance can help farmers prevent disease outbreaks, optimize feeding, manage water quality, and increase yields.

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "KurmaAI/AQUA-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
)

prompt = "What are the most common diseases in shrimp farming?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Open Source

Both models are released under the Apache 2.0 license — free to use, modify, and deploy. We believe open-source AI is essential for advancing aquaculture technology worldwide.

The AQUA Test Dataset is also available on HuggingFace for benchmarking and evaluation: huggingface.co/datasets/KurmaAI/AQUA-Test-Dataset.