chromium-browse — Lab Guides

What it is

chromium-browse is a Python script that controls any Chromium-based browser directly through the Chrome DevTools Protocol. It connects over WebSocket, sends raw CDP commands, and handles the full browsing lifecycle: launching the browser, navigating URLs, discovering and following links, scrolling, dwelling, and cleaning up. No Puppeteer, no Playwright, no Selenium. Two Python dependencies (websockets + aiohttp), everything else is stdlib.

It works with Chrome, Chromium, Edge, Brave, Island, or any other Chromium fork. You just point it at the browser binary.

Install

chromium-browse is distributed through the mac-security Homebrew tap.

# Add the tap (if you haven't already)
brew tap davidwhittington/mac-security

# Install
brew install chromium-browse

Setup

1. Create a URL list

Create a file called urls.txt (or any name you prefer) with one URL per line:

https://example.com
https://news.ycombinator.com
https://developer.mozilla.org
https://github.com/explore

Blank lines and lines starting with # are ignored.

2. Verify your browser

The script needs to know where your Chromium-based browser lives. It will auto-detect common paths, but you can override with the --browser flag if needed.

# Common browser paths on macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
/Applications/Chromium.app/Contents/MacOS/Chromium
/Applications/Microsoft\ Edge.app/Contents/MacOS/Microsoft\ Edge
/Applications/Brave\ Browser.app/Contents/MacOS/Brave\ Browser
/Applications/Island.app/Contents/MacOS/Island

Usage

# Run with defaults
chromium-browse

# Specify a URL file and browser
chromium-browse --urls urls.txt --browser /Applications/Island.app/Contents/MacOS/Island

# Customize behavior
chromium-browse \
  --urls urls.txt \
  --max-depth 2 \
  --min-delay 2 \
  --max-delay 8 \
  --max-links 5 \
  --rounds 3

Options

Flag	Default	Purpose
`--urls`	`urls.txt`	File containing URLs to visit (one per line)
`--browser`	auto-detect	Path to the Chromium-based browser binary
`--max-depth`	`2`	How many levels deep to follow discovered links from each starting URL
`--min-delay`	`2`	Minimum dwell time on each page (seconds)
`--max-delay`	`8`	Maximum dwell time on each page (seconds)
`--max-links`	`5`	Maximum number of same-domain links to follow per page
`--rounds`	`1`	Number of complete passes through the URL list

What it does

For each round, the script:

Launches the browser in --headless=new mode with a temporary user data directory. Isolated profile, no leftover state.
Shuffles the URL list so the visit order varies each round.
Navigates to each URL, waits for the page to load, then scrolls the viewport randomly.
Dwells on the page for a randomized period between --min-delay and --max-delay.
Discovers same-domain <a href> links on the page by querying the DOM.
Follows a random subset of those links (up to --max-links), recursing up to --max-depth levels. Skips off-domain URLs and non-page assets (images, PDFs, fonts, etc.).
Cleans up the temp profile directory on exit.

How it works under the hood

The script talks to the browser using the Chrome DevTools Protocol over a WebSocket connection. No browser automation framework sits in between. The flow:

Launch the browser with --remote-debugging-port and a disposable --user-data-dir
Poll http://localhost:PORT/json/version until the browser's CDP endpoint is ready
Connect to the webSocketDebuggerUrl from the version response
Send JSON-RPC messages with incrementing IDs, correlate responses by ID, handle events by method name
Use Target.createTarget + Target.attachToTarget to manage tabs
Use Page.navigate and Runtime.evaluate to drive browsing

Want the full CDP deep dive? The Headless Chromium & CDP guide covers the protocol in detail: every domain, common commands, remote debugging, backtraces, profiling, network interception, and more. chromium-browse is a practical implementation of the techniques documented there.

Dependencies

Two Python packages. Everything else is stdlib.

Package	Purpose
`websockets`	WebSocket connection to the browser's CDP endpoint
`aiohttp`	HTTP client for polling the CDP endpoint until the browser is ready