AI-Powered UI Test Automation Agent

The AI‑powered UI Test Automation Agent brings fresh approach to GUI testing by combining natural‑language test definitions, Generative AI, Retrieval‑Augmented Generation (RAG) and computer vision into a single Java‑based agent.

It eliminates brittle, selector‑based scripts by teaching itself UI elements through semantic embeddings and screenshots, then reliably reproduces human‑like interactions in both attended (training) and unattended (CI/CD) mode.

🔑 Key Features

1. AI Model Integration

Leverages LangChain4j to interface with instruction and vision AI models, splitting responsibilities so that one model decides on test actions and another handles visual verification.
Supports Google AI Studio/Vertex AI and Azure OpenAI, configurable via config.properties or AgentConfig.java (model names, API keys, endpoints, temperature, max tokens, retries).

2. Retrieval‑Augmented Generation (RAG)

Stores UI element metadata (name, description, surrounding “anchor” elements, screenshot) in a Chroma vector database.
Retrieves top‑N semantically similar elements based on test step descriptions, filtering by minimum similarity thresholds before conducting visual matching.

3. Computer Vision Element Location

Uses OpenCV template‑matching to find candidate UI elements on screen from stored screenshots.
Employs a vision AI model to disambiguate multiple matches or confirm a single match against contextual descriptions and anchors.

4. GUI Interaction Tools

Provides Java Robot‑based MouseTools (click, hover, drag) and KeyboardTools (typing, key presses) for human‑like interactions.
Includes CommonTools for waits and retries, enabling robust handling of dynamic UI latencies.

5. Two Execution Modes

Attended (Trainee) Mode: Agent prompts a human “mentor” when it encounters unknown or changed UI elements, building its RAG knowledge base interactively.
Unattended Mode: Fully autonomous execution relying solely on the learned RAG database and AI models—ideal for CI/CD pipelines, failing fast on errors.

6. Flexible Execution Interfaces

CLI Mode: Run locally by invoking the Agent class with a JSON test‑case file path .
Server Mode: Launches a Javalin web server (Server class) exposing a /testcase HTTP POST endpoint for distributed “swarm” execution.

By uniting natural‑language instructions, RAG‑powered element identification and AI‑backed visual verification, this agent transforms GUI testing from brittle script maintenance into a self‑learning, resilient process—perfect for modern DevOps and CI/CD workflows. For full details, examples and setup instructions, explore the GitHub repo by visiting main URL.

Combining Generative AI, RAG, and computer vision for UI test automation.

What QA Leaders Need to Know About AI in 2026