◈ FULL USER GUIDE ◈

🤖

Gemini Live Agent

Everything you need to know to use the app — from zero to analysis in under a minute.

✅ Works Instantly 🤖 Gemini 1.5 Flash 📱 Mobile + Desktop ⚡ No Backend

QUICK START

Start in 3 Ways

No setup required — the app is ready the moment it opens

▶️

OPTION A

Instant Demo

Click ▶ Run Demo and see the agent analyze a ready-made UI — no image needed, no API Key required.

Zero Setup

🖥️

OPTION B

Capture Your Screen

Click the Screen Capture area → choose your window → write your task → press Run.

Desktop Only

📱

OPTION C

Upload an Image

Take a Screenshot of any app or website → upload it → write your task → press Run.

Mobile + Desktop

STEP BY STEP

Detailed Usage Steps

Follow these steps one by one

🚀

STEP 01 — LAUNCH

Open the App

Open the app — the API Key loads automatically and appears pre-filled in the gold input field.
You'll see the status show ✅ API Key Ready — Let's Analyze!
The app also auto-detects whether you're on mobile or desktop and adjusts accordingly.

ℹ️ Want to use your own key? Clear the gold field and paste your key from aistudio.google.com

📸

STEP 02 — CAPTURE

Prepare the Image to Analyze

You have 3 options:

1. Demo — Click ▶ Run Demo to load a ready-made UI sample

2. Screen Capture (Desktop) — Click the dashed preview area → choose a window → wait a moment → preview appears

3. Upload (Mobile or Desktop) — Take a screenshot → click "Upload Image" → select your file

💡 After selecting an image, you'll see ✅ in the Capture area

✍️

STEP 03 — TASK

Write What You Want

In the Task field, type in English or Arabic what you want Gemini to do:

• Analyze this page and list all elements
• How many buttons are there and what does each do?
• Extract all visible text in order
• Evaluate the user experience and suggest improvements
• What type of page is this and what is its purpose?

💡 Click Preset for random ready-made task examples

⚡

STEP 04 — RUN

Press Run and Watch the Magic

Click ▶ Run — watch the Pipeline light up step by step:

📸 Capture → 👁️ Gemini → 🔍 Detect → 🧠 Engine → 📦 JSON → ✅ Done

Within seconds, your result appears with: summary, page type, element counts, extracted text, recommended actions, UX Score, and accessibility notes.

💡 Response time is tracked live in the MS counter

📤

STEP 05 — EXPORT

Do More with Your Result

After analysis, you have 4 export options in the result panel:

🔊 Listen — Hear the full result read aloud in Arabic or English
📋 Copy — Copy the full text to your clipboard
📄 PDF — Save a full report with stats and JSON — press Ctrl+P → Save as PDF
📦 JSON — Download a structured .json file with all the data

💡 The JSON file can be used in other apps, databases, or APIs

INTERFACE MAP

What's on the Screen

Every section of the UI and what it does

🔑API Config

Gold input field for API Key

Save & Test buttons

Green/red status indicator

👁 Show/hide key toggle

🎛️Features Bar

👁️ Vision — always active

🔊 Voice — enable audio output

🎤 Mic — voice commands input

🔄 Compare — compare two pages

📄 PDF — export report

📦 JSON — export raw data

⚡Pipeline

7 nodes that animate during analysis

Green = currently active

Cyan = completed

Red = error occurred

📸Screen Capture

Image preview area

Animated scan line on capture

Upload area shown on mobile

🤖Robot Center

Status text below robot

Voice wave animation

4 counters: Tasks / Scans / ACC / MS

🎯Control Panel

Task input textarea

Buttons: Voice, Mic, PDF, JSON, Compare, Clear, Stop, Run

Progress bar during analysis

Operation log tracking every step

FEATURES

Features Reference Table

Every feature, how to use it, and where it works

Feature	How to Activate	What It Does	Desktop	Mobile
👁️ Vision Analysis	Press `▶ Run`	Analyzes the image with Gemini — elements, text, page type, UX score	✅	✅
▶️ Demo Mode	Press `▶ Run Demo`	Loads a pre-built UI image and analyzes it — works without API Key	✅	✅
📸 Screen Capture	Click the preview area	Captures any window or tab directly from your desktop	✅	—
📁 Image Upload	Click "Upload Image"	Upload any screenshot or image for analysis	✅	✅
🔊 Voice Output	Press 🔊 button	Reads the analysis result aloud in Arabic or English	✅	✅
🎤 Voice Commands	Press 🎤 then Start	Speak your task instead of typing — auto speech-to-text	✅	✅
🔄 Compare Mode	Press 🔄 button	Compare two pages with Gemini — scores, differences, recommendation	✅	✅
📄 PDF Export	Press 📄 button	Full report with stats and JSON — press Ctrl+P to save as PDF	✅	✅
📦 JSON Export	Press 📦 button	Downloads a structured .json file with all analysis data	✅	✅
📋 Copy Result	Press "Copy" in result panel	Copies the full analysis text to clipboard	✅	✅

PRO TIPS

Tips for the Best Results

🎯

Be Specific in Your Task

Instead of "analyze the page" — try "list all buttons and describe what each one does" for much better results.

🖼️

Clear Images = Better Results

Use full-resolution screenshots. Blurry or small images reduce Gemini's detection accuracy significantly.

🔄

Use Compare for A/B Testing

Capture two versions of a UI — old vs new, or your app vs a competitor — and get a scored comparison.

📦

Save the JSON Output

The structured JSON can be fed into other tools, databases, or APIs for further processing and automation.

🎤

Voice Input is Faster

Enable the mic and just speak your task — it converts to text automatically. Great for quick analysis on the go.

⚡

Use Preset to Save Time

The Preset button generates random ready-made tasks — useful when you're not sure what to ask.

FAQ

Frequently Asked Questions

The app isn't working — I'm seeing an error ▼

Make sure you have an image loaded (captured or uploaded), and that the API Key is valid. If a red toast appears, read it carefully — it tells you exactly what went wrong.

Screen Capture isn't working on my device ▼

Screen Capture requires a modern browser (Chrome or Edge) and an HTTPS connection. It is not supported on mobile devices — use the Upload option instead.

The microphone isn't working ▼

Make sure Chrome has microphone permission — go to Settings → Privacy → Microphone. Voice input only works in Chrome-based browsers over HTTPS.

How do I change the voice language? ▼

The app automatically selects the best available voice on your device. If you want English output, write your task in English and Gemini will respond in English.

Analysis is slow — why? ▼

Speed depends on your internet connection and Gemini API server load. Typically it completes in 3–8 seconds. If it's very slow, try reducing the image size.

Are my screenshots stored or shared anywhere? ▼

Images are sent directly to the Google Gemini API for analysis only. There is no backend server — nothing is stored. The app runs entirely in your browser.

Can I use my own Gemini API Key? ▼

Yes! Clear the gold API Key field, paste your key from aistudio.google.com, and press Save. Your key will be stored locally and used for all future analysis.

Ready to Try It? 🚀

Everything is set up — open the app and press Demo to see it in action instantly.

▶ Launch App Now 🔑 Get Free API Key

    ⚡ Powered by Gemini 1.5 Flash · 🌍 Arabic + English · 💻 Works in any modern browser