Announcing: AQUA Research Paper Published on arXiv

Our research paper "AQUA: A Large Language Model for Aquaculture & Fisheries" is now available on arXiv (arXiv:2507.20520).

Paper Overview

The paper presents AQUA — the first large language model specifically designed for the aquaculture and fisheries domain. Key contributions include:

AQUADAPT Framework

We introduce AQUADAPT (Agentic Framework for generating and refining high-quality synthetic data), a novel approach to creating domain-specific training data using agentic AI pipelines. This framework enables:

Automated generation of high-quality QA pairs from domain literature
Multi-stage verification and refinement of training data
Scalable data creation across 11 aquaculture sub-domains

Training Data

55,000+ curated documents from web sources and open-access publications
3 million+ expert-verified QA pairs covering the full aquaculture taxonomy
~1 billion training tokens processed during fine-tuning

Benchmark Results

AQUA-7B demonstrates significant improvements over general-purpose models:

30-40% higher accuracy on domain-specific tasks compared to LLaMA 3.1 8B and Qwen 2.5 7B
95% average accuracy on aquaculture domain benchmarks
BLEU-4: 49.19 — Strong multiword phrase fidelity
ROUGE-1: 51.45 — High coverage of key domain terms

Model Architecture

Base Model: Mistral-7B-Instruct-v0.3
Fine-tuning: LoRA with AdamW optimizer
Hardware: 8× NVIDIA H200 GPUs
Training Time: ~32 hours

Read the Paper

arXiv: arxiv.org/abs/2507.20520
PDF: arxiv.org/pdf/2507.20520

Citation

If you use AQUA in your research, please cite our paper. The full BibTeX citation is available on our Research page.