🎛️ Scraper Command Center — Super Admin
homes.kanawut.com — Data Pipeline Management
📊 Data Explorer
LIVE
22 Apr 2026 — 15:31 ICT
Overview
🤖
Total Scrapers
17
11 NPA + home.co.th + FB + 4 new sources
🏠
Total Records
73,133+
66,148 NPA + 6,760 home.co.th + 225 new sources
🕐
Last Run
22 เม.ย.
2026 — most scrapers 02:46–02:52
💾
Database Size
~460MB
SQLite raw + Neon PostgreSQL
Scraper Status
All Scrapers — 17 total
15 active · 1 ready · 1 RSS · 0 errors
# Scraper Source Records DB Size Last Run Status
01 BAM bam.co.th ~25,000
93.9 MB
22 เม.ย. 02:52 ✅ Active
02 KBank kbank.com ~15,000
49.9 MB
22 เม.ย. 02:52 ✅ Active
03 GHB ghbank.co.th ~5,000
17.7 MB
22 เม.ย. 02:52 ✅ Active
04 Krungthai ktb.co.th ~3,000
4.1 MB
22 เม.ย. 02:52 ✅ Active
05 Krungsri krungsri.com ~2,000
5.8 MB
22 เม.ย. 02:52 ✅ Active
06 GSB gsb.or.th ~2,500
6.3 MB
22 เม.ย. 02:52 ✅ Active
07 SAM sam.or.th ~1,500
3.4 MB
22 เม.ย. 02:52 ✅ Active
08 SCB scb.co.th ~2,000
4.8 MB
22 เม.ย. 02:46 ✅ Active
09 KKPFG kkpfg.com ~500
1.7 MB
22 เม.ย. 02:46 ✅ Active
10 SWPAMC swpamc.com ~1,500
3.8 MB
22 เม.ย. 02:52 ✅ Active
11 LivingInsider livinginsider.com ~8,000
108.2 MB
22 เม.ย. 02:51 ✅ Active
12 home.co.th home.co.th 6,760
167 MB
23 เม.ย. 15:31 ✅ Complete
13 Facebook facebook.com 0 ⏳ Ready (script)
14 ThinkOfLiving thinkofliving.com 84
0.1 MB
23 เม.ย. 17:10 ✅ Active
15 ReviewYourLiving reviewyourliving.com 76 + 9 RSS
0.1 MB
23 เม.ย. 19:40 ✅ Active
16 TerraBKK terrabkk.com 44
0.0 MB
23 เม.ย. 17:48 ✅ Active
17 PropertyScout propertyscout.co.th 21
0.0 MB
23 เม.ย. 18:37 ✅ Active
Storage Distribution
SQLite DB Sizes (460MB total)
LivingInsider108.2 MB
home.co.th167 MB
BAM93.9 MB
KBank49.9 MB
GHB17.7 MB
Others (7 scrapers)~23.9 MB
Record Distribution
BAM~25,000
KBank~15,000
LivingInsider~8,000
home.co.th6,760
GHB~5,000
Others (7 scrapers)~11,148
Data Pipeline Flow
STAGE 01
🤖
Scrapers
13 scrapers
11 NPA banks + home.co.th + Facebook
crawler_server.py :5002
STAGE 02
🗄️
Raw SQLite DBs
~460 MB total
Per-scraper .db files
backup/NPA-Property-Platform/
STAGE 03
⚙️
Clean / Normalize
Deduplicate, validate,
normalize field formats
import-npa.py
STAGE 04
🐘
Neon PostgreSQL
66,148 NPA properties
Central database
Neon — serverless PG
STAGE 05
🌐
homes.kanawut.com
Live property search
portal (production)
Next.js — live
Scraper Details
🏦 NPA Bank Scrapers (11)
Location backup/NPA-Property-Platform/
Server crawler_server.py on port 5002
Trigger POST /api/crawl/{bank}
Config {"update_existing": true, "stale_days": 7}
Crawlers
bam_crawler.py kbank_crawler.py ghb_crawler.py ktb_crawler.py krungsri_crawler.py gsb_crawler.py sam_crawler.py scb_crawler.py kkpfg_crawler.py swpamc_crawler.py livinginsider_crawler.py
Active scrapers11 / 11
🏠 home.co.th Scraper
Script boom-tools/scripts/home-coth-url-scanner.js
Method Puppeteer brute-force scan topic IDs 55000–62000
Data 6,760 projects in JSON
Resume --scrape flag to continue from last position
Tracking scan-tracking.json — no duplicates
Last Run 23 เม.ย. 2026 — 15:31
Scan coverage (IDs 55000–62000)100%
📘 Facebook Scraper
Script boom-tools/scripts/fb-scraper.js
Method Puppeteer headless=false (requires real browser session)
Records 0 — not run yet
Status ⏳ Script ready — awaiting run
Note Requires Facebook login session cookie. Run manually — headless mode not supported by FB anti-bot.
ReadinessScript only
URL Tracker
Tracked Sources — Crawl History
Source URLs Tracked Scraped Pending Last Crawl Next Crawl Schedule
bam.co.th ~25,000 pages 25,000 ~500 stale 22 เม.ย. 02:52 29 เม.ย. Weekly
kbank.com ~15,000 pages 15,000 ~300 stale 22 เม.ย. 02:52 29 เม.ย. Weekly
home.co.th 7,000 topic IDs 6,760 0 new 23 เม.ย. 15:31 Manual On-demand
ghbank.co.th ~5,000 pages 5,000 ~100 stale 22 เม.ย. 02:52 29 เม.ย. Weekly
livinginsider.com ~8,000 pages 8,000 ~200 stale 22 เม.ย. 02:51 29 เม.ย. Weekly
facebook.com 0 Pending TBD Manual
AI Pipeline — Crawlab Custom
🕷️
1. Scrape
Track every URL
When scraped / What found
72,908 records
🧹
2. Clean
Remove duplicates
Normalize fields
66,148 cleaned
🤖
3. AI Enrich
Category detection
Price extraction / Geocoding
In progress
📝
4. Summarize
AI-generated descriptions
SEO content / Highlights
5 articles done
🌐
5. Present
homes.kanawut.com
kanawut.com/projects
LIVE
Pipeline Progress
Overall completionStage 3/5 — AI Enrichment
Schedule Manager
NPA Banks (11)
FrequencyWeekly (Every Monday 02:00)
Stale After7 days
Last Run22 เม.ย. 02:46-02:52
Next Run28 เม.ย. 02:00
CommandPOST /api/crawl/all
home.co.th
FrequencyOn-demand (manual)
Stale After30 days
Last Run23 เม.ย. (COMPLETE)
Next RunManual trigger
Commandnode scanner.js --scrape
AI Processing
Clean + ImportAfter each crawl
AI EnrichBatch — 100/run
SEO Generate5 done / 6,760 total
PriorityNew listings first
ModelClaude / GPT-4o
🔄 Re-run All NPA Crawlers POST /api/crawl/all
🏠 Re-scan home.co.th node home-coth-url-scanner.js --scrape
📊 Import to Neon PostgreSQL python import-npa.py
🤖 Run AI Enrichment batch process 100 records
📝 Generate SEO Articles python generate-seo-articles.py
🔍 Check Data Quality audit scripts
System
Infrastructure
DB EngineNeon PostgreSQL (serverless)
Raw StoreSQLite per scraper (~460MB)
FrontendNext.js — homes.kanawut.com
Scraper Hostcrawler_server.py :5002
Data Health
NPA Records66,148
home.co.th6,760
Stale Threshold7 days
DedupeActive (import-npa.py)
Pipeline Scripts
NPA Importimport-npa.py
home.co.thhome-coth-url-scanner.js
Facebookfb-scraper.js
Crawl APIcrawler_server.py