Today we're releasing AQUA-7B and AQUA-1B on HuggingFace — the world's first language models purpose-built for aquaculture and fisheries.
The Models
AQUA-7B — Flagship Model
- Parameters: 7 billion
- Base Model: Mistral-7B-Instruct-v0.3
- Training Data: 3M+ expert-verified QA pairs
- Training Tokens: ~1 billion
- License: Apache 2.0
- Format: Safetensors (BF16)
- Download: huggingface.co/KurmaAI/AQUA-7B
AQUA-1B — Edge / Mobile Model
- Parameters: 1 billion
- License: Apache 2.0
- Format: Safetensors
- Download: huggingface.co/KurmaAI/AQUA-1B
Why Domain-Specific Models?
General-purpose LLMs like GPT-4, LLaMA, and Qwen are impressive, but they fall short on specialized aquaculture knowledge. Our benchmarks show that AQUA-7B outperforms LLaMA 3.1 8B and Qwen 2.5 7B by 30-40% on aquaculture domain tasks.
This matters because aquaculture is a $300B+ global industry that feeds billions of people. Accurate, reliable AI guidance can help farmers prevent disease outbreaks, optimize feeding, manage water quality, and increase yields.
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "KurmaAI/AQUA-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
)
prompt = "What are the most common diseases in shrimp farming?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Open Source
Both models are released under the Apache 2.0 license — free to use, modify, and deploy. We believe open-source AI is essential for advancing aquaculture technology worldwide.
The AQUA Test Dataset is also available on HuggingFace for benchmarking and evaluation: huggingface.co/datasets/KurmaAI/AQUA-Test-Dataset.
