โ—ˆ FULL USER GUIDE โ—ˆ
๐Ÿค–

Gemini Live Agent

Everything you need to know to use the app โ€” from zero to analysis in under a minute.

โœ… Works Instantly ๐Ÿค– Gemini 1.5 Flash ๐Ÿ“ฑ Mobile + Desktop โšก No Backend
QUICK START
Start in 3 Ways
No setup required โ€” the app is ready the moment it opens
โ–ถ๏ธ
OPTION A
Instant Demo
Click โ–ถ Run Demo and see the agent analyze a ready-made UI โ€” no image needed, no API Key required.
Zero Setup
๐Ÿ–ฅ๏ธ
OPTION B
Capture Your Screen
Click the Screen Capture area โ†’ choose your window โ†’ write your task โ†’ press Run.
Desktop Only
๐Ÿ“ฑ
OPTION C
Upload an Image
Take a Screenshot of any app or website โ†’ upload it โ†’ write your task โ†’ press Run.
Mobile + Desktop
STEP BY STEP
Detailed Usage Steps
Follow these steps one by one
๐Ÿš€
STEP 01 โ€” LAUNCH
Open the App
Open the app โ€” the API Key loads automatically and appears pre-filled in the gold input field.
You'll see the status show โœ… API Key Ready โ€” Let's Analyze!
The app also auto-detects whether you're on mobile or desktop and adjusts accordingly.
โ„น๏ธ Want to use your own key? Clear the gold field and paste your key from aistudio.google.com
๐Ÿ“ธ
STEP 02 โ€” CAPTURE
Prepare the Image to Analyze
You have 3 options:

1. Demo โ€” Click โ–ถ Run Demo to load a ready-made UI sample

2. Screen Capture (Desktop) โ€” Click the dashed preview area โ†’ choose a window โ†’ wait a moment โ†’ preview appears

3. Upload (Mobile or Desktop) โ€” Take a screenshot โ†’ click "Upload Image" โ†’ select your file
๐Ÿ’ก After selecting an image, you'll see โœ… in the Capture area
โœ๏ธ
STEP 03 โ€” TASK
Write What You Want
In the Task field, type in English or Arabic what you want Gemini to do:

โ€ข Analyze this page and list all elements
โ€ข How many buttons are there and what does each do?
โ€ข Extract all visible text in order
โ€ข Evaluate the user experience and suggest improvements
โ€ข What type of page is this and what is its purpose?
๐Ÿ’ก Click Preset for random ready-made task examples
โšก
STEP 04 โ€” RUN
Press Run and Watch the Magic
Click โ–ถ Run โ€” watch the Pipeline light up step by step:

๐Ÿ“ธ Capture โ†’ ๐Ÿ‘๏ธ Gemini โ†’ ๐Ÿ” Detect โ†’ ๐Ÿง  Engine โ†’ ๐Ÿ“ฆ JSON โ†’ โœ… Done

Within seconds, your result appears with: summary, page type, element counts, extracted text, recommended actions, UX Score, and accessibility notes.
๐Ÿ’ก Response time is tracked live in the MS counter
๐Ÿ“ค
STEP 05 โ€” EXPORT
Do More with Your Result
After analysis, you have 4 export options in the result panel:

๐Ÿ”Š Listen โ€” Hear the full result read aloud in Arabic or English
๐Ÿ“‹ Copy โ€” Copy the full text to your clipboard
๐Ÿ“„ PDF โ€” Save a full report with stats and JSON โ€” press Ctrl+P โ†’ Save as PDF
๐Ÿ“ฆ JSON โ€” Download a structured .json file with all the data
๐Ÿ’ก The JSON file can be used in other apps, databases, or APIs
INTERFACE MAP
What's on the Screen
Every section of the UI and what it does
๐Ÿ”‘API Config
Gold input field for API Key
Save & Test buttons
Green/red status indicator
๐Ÿ‘ Show/hide key toggle
๐ŸŽ›๏ธFeatures Bar
๐Ÿ‘๏ธ Vision โ€” always active
๐Ÿ”Š Voice โ€” enable audio output
๐ŸŽค Mic โ€” voice commands input
๐Ÿ”„ Compare โ€” compare two pages
๐Ÿ“„ PDF โ€” export report
๐Ÿ“ฆ JSON โ€” export raw data
โšกPipeline
7 nodes that animate during analysis
Green = currently active
Cyan = completed
Red = error occurred
๐Ÿ“ธScreen Capture
Image preview area
Animated scan line on capture
Upload area shown on mobile
๐Ÿค–Robot Center
Status text below robot
Voice wave animation
4 counters: Tasks / Scans / ACC / MS
๐ŸŽฏControl Panel
Task input textarea
Buttons: Voice, Mic, PDF, JSON, Compare, Clear, Stop, Run
Progress bar during analysis
Operation log tracking every step
FEATURES
Features Reference Table
Every feature, how to use it, and where it works
Feature How to Activate What It Does Desktop Mobile
๐Ÿ‘๏ธ Vision Analysis Press โ–ถ Run Analyzes the image with Gemini โ€” elements, text, page type, UX score โœ… โœ…
โ–ถ๏ธ Demo Mode Press โ–ถ Run Demo Loads a pre-built UI image and analyzes it โ€” works without API Key โœ… โœ…
๐Ÿ“ธ Screen Capture Click the preview area Captures any window or tab directly from your desktop โœ… โ€”
๐Ÿ“ Image Upload Click "Upload Image" Upload any screenshot or image for analysis โœ… โœ…
๐Ÿ”Š Voice Output Press ๐Ÿ”Š button Reads the analysis result aloud in Arabic or English โœ… โœ…
๐ŸŽค Voice Commands Press ๐ŸŽค then Start Speak your task instead of typing โ€” auto speech-to-text โœ… โœ…
๐Ÿ”„ Compare Mode Press ๐Ÿ”„ button Compare two pages with Gemini โ€” scores, differences, recommendation โœ… โœ…
๐Ÿ“„ PDF Export Press ๐Ÿ“„ button Full report with stats and JSON โ€” press Ctrl+P to save as PDF โœ… โœ…
๐Ÿ“ฆ JSON Export Press ๐Ÿ“ฆ button Downloads a structured .json file with all analysis data โœ… โœ…
๐Ÿ“‹ Copy Result Press "Copy" in result panel Copies the full analysis text to clipboard โœ… โœ…
PRO TIPS
Tips for the Best Results
๐ŸŽฏ
Be Specific in Your Task
Instead of "analyze the page" โ€” try "list all buttons and describe what each one does" for much better results.
๐Ÿ–ผ๏ธ
Clear Images = Better Results
Use full-resolution screenshots. Blurry or small images reduce Gemini's detection accuracy significantly.
๐Ÿ”„
Use Compare for A/B Testing
Capture two versions of a UI โ€” old vs new, or your app vs a competitor โ€” and get a scored comparison.
๐Ÿ“ฆ
Save the JSON Output
The structured JSON can be fed into other tools, databases, or APIs for further processing and automation.
๐ŸŽค
Voice Input is Faster
Enable the mic and just speak your task โ€” it converts to text automatically. Great for quick analysis on the go.
โšก
Use Preset to Save Time
The Preset button generates random ready-made tasks โ€” useful when you're not sure what to ask.
FAQ
Frequently Asked Questions
The app isn't working โ€” I'm seeing an error โ–ผ
Make sure you have an image loaded (captured or uploaded), and that the API Key is valid. If a red toast appears, read it carefully โ€” it tells you exactly what went wrong.
Screen Capture isn't working on my device โ–ผ
Screen Capture requires a modern browser (Chrome or Edge) and an HTTPS connection. It is not supported on mobile devices โ€” use the Upload option instead.
The microphone isn't working โ–ผ
Make sure Chrome has microphone permission โ€” go to Settings โ†’ Privacy โ†’ Microphone. Voice input only works in Chrome-based browsers over HTTPS.
How do I change the voice language? โ–ผ
The app automatically selects the best available voice on your device. If you want English output, write your task in English and Gemini will respond in English.
Analysis is slow โ€” why? โ–ผ
Speed depends on your internet connection and Gemini API server load. Typically it completes in 3โ€“8 seconds. If it's very slow, try reducing the image size.
Are my screenshots stored or shared anywhere? โ–ผ
Images are sent directly to the Google Gemini API for analysis only. There is no backend server โ€” nothing is stored. The app runs entirely in your browser.
Can I use my own Gemini API Key? โ–ผ
Yes! Clear the gold API Key field, paste your key from aistudio.google.com, and press Save. Your key will be stored locally and used for all future analysis.

Ready to Try It? ๐Ÿš€

Everything is set up โ€” open the app and press Demo to see it in action instantly.

โšก Powered by Gemini 1.5 Flash ยท ๐ŸŒ Arabic + English ยท ๐Ÿ’ป Works in any modern browser