cwv-lighthouse-scores

This project processes URLs to fetch Lighthouse scores using the Google PageSpeed Insights API.

Setup

Clone the repository:

git clone https://github.com/yourusername/my_project.git
cd my_project

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Add your Google API key: Update the API_KEY in config.py with your Google PageSpeed Insights API key.
Place your input data: Ensure that your cwv.csv file is in the data directory.

URL Format Requirements

All URLs in the cwv.csv file must be in the format https://www.example.com.
Ensure that each URL starts with https:// and includes www. to avoid any issues with API requests.

Platform Column

The cwv.csv file should include a platform column, which differentiates between "Carrot" and "Non-Carrot" sites.
This data is directly gleaned from the script output at carrot-serp-compare. The TRUE and FALSE values from this script need to be turned into "Carrot" and "Non-Carrot" respectively.
The reason for this differentiation is to provide mean CWV (Core Web Vitals) scores for Carrot vs Non-Carrot sites in the comparison file at the end of the processing.

Running the Script

Run the main script:
```
python main.py
```

Output

The processed Lighthouse scores will be saved in data/lighthouse_scores.csv.
The comparison results will be saved in data/comparison_results.csv.
Errors and logs will be saved in logs/errors.log.

Parallel Processing

The current setup uses parallel processing to speed up the fetching of Lighthouse scores.
URLs are processed concurrently using Python's concurrent.futures.ThreadPoolExecutor, which significantly reduces the total processing time. That said, the script does take a long time to execute when there are thousands of URLs. When ran in a virtual environment, your console will provide updates, such as X/Y processed.
By default, the script uses 10 threads to handle multiple requests in parallel. This can be adjusted by modifying the max_workers parameter in the main.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
.gitignore		.gitignore
README.md		README.md
api.py		api.py
config.py		config.py
data_management.py		data_management.py
logging_config.py		logging_config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cwv-lighthouse-scores

Setup

URL Format Requirements

Platform Column

Running the Script

Output

Parallel Processing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cwv-lighthouse-scores

Setup

URL Format Requirements

Platform Column

Running the Script

Output

Parallel Processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages