Pixiv Encyclopedia Viewer History Extractor

"web_ui"

An Extraction Tool for Pixiv Encyclopedia Viewer Count History

Japanese README

日本語版はこちら README.ja.md.

Overview

Extract daily view history data from a Pixiv Encyclopedia (pixiv百科事典) article.

Pixiv Encyclopedia viewer history is a nice real-world time-series dataset.

It often shows:

Weekly seasonality (weekday vs weekend traffic)
Occasional spikes caused by events or social media buzz

You can use the extracted CSV as sample data for:

Time-series visualization and smoothing
Seasonal decomposition
Forecasting models (ARIMA, Prophet, etc.)

⚠️ Unofficial tool
This project is not affiliated with or endorsed by Pixiv.
Please follow Pixiv’s Terms of Use and robots.txt when using this script.

Features

Fetch by article title (e.g., "ブルーアーカイブ") directly from Pixiv Encyclopedia
Or read from a local HTML file
Output JSON Lines to stdout
(one {"date": "...","count": ...} per line)
Optional CSV export via --csv output.csv

Requirements

Python 3.9+
Dependencies:
- requests
- beautifulsoup4

Usage

0. Create virtual environment

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

1. Fetch by article title

python src/extract_viewer_history.py "ブルーアーカイブ"

This will:

Download https://dic.pixiv.net/a/ブルーアーカイブ
Parse the embedded JSON
Print one JSON object per line to stdout:

{"date": "2025-07-01", "count": 9454605}
{"date": "2025-07-02", "count": 9331510}
{"date": "2025-07-03", "count": 8884117}
...

You can redirect it to a file:

python src/extract_viewer_history.py "ブルーアーカイブ" > ブルーアーカイブ.jsonl

2. Export as CSV

Use the --csv option to write a CSV file while still printing JSON to stdout:

python src/extract_viewer_history.py "ブルーアーカイブ" --csv ブルーアーカイブ.csv

Example CSV content:

date,count
2025-07-01,9454605
2025-07-02,9331510
2025-07-03,8884117
...

3. Use a local HTML file

If you have already saved the article HTML:

python src/extract_viewer_history.py ブルーアーカイブ.html
python src/extract_viewer_history.py ブルーアーカイブ.html --csv ブルーアーカイブ.csv

The script will detect that ブルーアーカイブ.html is a file and will parse it instead of fetching from the web.

4. Test

pip install -r requirements.test.txt
pytest

5. Deactivate environment

deactivate

Notes / Limitations

No rate limiting is implemented; please:
- Use it responsibly
- Avoid sending many requests in a short time
This is a simple utility script, primarily intended for personal analysis or research.

License

Apache License 2.0