Pixiv Encyclopedia Viewer History Extractor

Pixiv Encyclopedia Viewer History Extractor

Python OS

CodeQL Advanced Python Lint Pytest

"web_ui"

An Extraction Tool for Pixiv Encyclopedia Viewer Count History

Japanese README

日本語版はこちら README.ja.md.


Overview

Extract daily view history data from a Pixiv Encyclopedia (pixiv百科事典) article.

Pixiv Encyclopedia viewer history is a nice real-world time-series dataset.

It often shows:

  • Weekly seasonality (weekday vs weekend traffic)
  • Occasional spikes caused by events or social media buzz

You can use the extracted CSV as sample data for:

  • Time-series visualization and smoothing
  • Seasonal decomposition
  • Forecasting models (ARIMA, Prophet, etc.)

⚠️ Unofficial tool
This project is not affiliated with or endorsed by Pixiv.
Please follow Pixiv’s Terms of Use and robots.txt when using this script.

Features

  • Fetch by article title (e.g., "ブルーアーカイブ") directly from Pixiv Encyclopedia
  • Or read from a local HTML file
  • Output JSON Lines to stdout
    (one {"date": "...","count": ...} per line)
  • Optional CSV export via --csv output.csv

Requirements

  • Python 3.9+
  • Dependencies:
    • requests
    • beautifulsoup4

Usage

0. Create virtual environment

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

1. Fetch by article title

python src/extract_viewer_history.py "ブルーアーカイブ"

This will:

  • Download https://dic.pixiv.net/a/ブルーアーカイブ
  • Parse the embedded JSON
  • Print one JSON object per line to stdout:
{"date": "2025-07-01", "count": 9454605}
{"date": "2025-07-02", "count": 9331510}
{"date": "2025-07-03", "count": 8884117}
...

You can redirect it to a file:

python src/extract_viewer_history.py "ブルーアーカイブ" > ブルーアーカイブ.jsonl

2. Export as CSV

Use the --csv option to write a CSV file while still printing JSON to stdout:

python src/extract_viewer_history.py "ブルーアーカイブ" --csv ブルーアーカイブ.csv

Example CSV content:

date,count
2025-07-01,9454605
2025-07-02,9331510
2025-07-03,8884117
...

3. Use a local HTML file

If you have already saved the article HTML:

python src/extract_viewer_history.py ブルーアーカイブ.html
python src/extract_viewer_history.py ブルーアーカイブ.html --csv ブルーアーカイブ.csv

The script will detect that ブルーアーカイブ.html is a file and will parse it instead of fetching from the web.


4. Test

pip install -r requirements.test.txt
pytest

5. Deactivate environment

deactivate

Notes / Limitations

  • No rate limiting is implemented; please:
    • Use it responsibly
    • Avoid sending many requests in a short time
  • This is a simple utility script, primarily intended for personal analysis or research.

License

  • Apache License 2.0