Overview
Total Scrapers
17
11 NPA + home.co.th + FB + 4 new sources
Total Records
73,133+
66,148 NPA + 6,760 home.co.th + 225 new sources
Last Run
22 เม.ย.
2026 — most scrapers 02:46–02:52
Database Size
~460MB
SQLite raw + Neon PostgreSQL
Scraper Status
All Scrapers — 17 total
| # | Scraper | Source | Records | DB Size | Last Run | Status |
|---|---|---|---|---|---|---|
| 01 | BAM | bam.co.th | ~25,000 | 22 เม.ย. 02:52 | ✅ Active | |
| 02 | KBank | kbank.com | ~15,000 | 22 เม.ย. 02:52 | ✅ Active | |
| 03 | GHB | ghbank.co.th | ~5,000 | 22 เม.ย. 02:52 | ✅ Active | |
| 04 | Krungthai | ktb.co.th | ~3,000 | 22 เม.ย. 02:52 | ✅ Active | |
| 05 | Krungsri | krungsri.com | ~2,000 | 22 เม.ย. 02:52 | ✅ Active | |
| 06 | GSB | gsb.or.th | ~2,500 | 22 เม.ย. 02:52 | ✅ Active | |
| 07 | SAM | sam.or.th | ~1,500 | 22 เม.ย. 02:52 | ✅ Active | |
| 08 | SCB | scb.co.th | ~2,000 | 22 เม.ย. 02:46 | ✅ Active | |
| 09 | KKPFG | kkpfg.com | ~500 | 22 เม.ย. 02:46 | ✅ Active | |
| 10 | SWPAMC | swpamc.com | ~1,500 | 22 เม.ย. 02:52 | ✅ Active | |
| 11 | LivingInsider | livinginsider.com | ~8,000 | 22 เม.ย. 02:51 | ✅ Active | |
| 12 | home.co.th | home.co.th | 6,760 | 23 เม.ย. 15:31 | ✅ Complete | |
| 13 | facebook.com | 0 | — | — | ⏳ Ready (script) | |
| 14 | ThinkOfLiving | thinkofliving.com | 84 | 23 เม.ย. 17:10 | ✅ Active | |
| 15 | ReviewYourLiving | reviewyourliving.com | 76 + 9 RSS | 23 เม.ย. 19:40 | ✅ Active | |
| 16 | TerraBKK | terrabkk.com | 44 | 23 เม.ย. 17:48 | ✅ Active | |
| 17 | PropertyScout | propertyscout.co.th | 21 | 23 เม.ย. 18:37 | ✅ Active |
Storage Distribution
SQLite DB Sizes (460MB total)
LivingInsider108.2 MB
home.co.th167 MB
BAM93.9 MB
KBank49.9 MB
GHB17.7 MB
Others (7 scrapers)~23.9 MB
Record Distribution
BAM~25,000
KBank~15,000
LivingInsider~8,000
home.co.th6,760
GHB~5,000
Others (7 scrapers)~11,148
Data Pipeline Flow
STAGE 01
Scrapers
13 scrapers
11 NPA banks + home.co.th + Facebook
crawler_server.py :5002
11 NPA banks + home.co.th + Facebook
→
STAGE 02
Raw SQLite DBs
~460 MB total
Per-scraper .db files
backup/NPA-Property-Platform/
Per-scraper .db files
→
STAGE 03
Clean / Normalize
Deduplicate, validate,
normalize field formats
import-npa.py
normalize field formats
→
STAGE 04
Neon PostgreSQL
66,148 NPA properties
Central database
Neon — serverless PG
Central database
→
STAGE 05
homes.kanawut.com
Live property search
portal (production)
Next.js — live
portal (production)
Scraper Details
🏦
NPA Bank Scrapers (11)
▼
Location
backup/NPA-Property-Platform/
Server
crawler_server.py on port 5002
Trigger
POST /api/crawl/{bank}
Config
{"update_existing": true, "stale_days": 7}
Crawlers
bam_crawler.py
kbank_crawler.py
ghb_crawler.py
ktb_crawler.py
krungsri_crawler.py
gsb_crawler.py
sam_crawler.py
scb_crawler.py
kkpfg_crawler.py
swpamc_crawler.py
livinginsider_crawler.py
Active scrapers11 / 11
🏠
home.co.th Scraper
▼
Script
boom-tools/scripts/home-coth-url-scanner.js
Method
Puppeteer brute-force scan topic IDs
55000–62000
Data
6,760 projects in JSON
Resume
--scrape flag to continue from last position
Tracking
scan-tracking.json — no duplicates
Last Run
23 เม.ย. 2026 — 15:31
Scan coverage (IDs 55000–62000)100%
📘
Facebook Scraper
▼
Script
boom-tools/scripts/fb-scraper.js
Method
Puppeteer
headless=false (requires real browser session)
Records
0 — not run yet
Status
⏳ Script ready — awaiting run
Note
Requires Facebook login session cookie. Run manually — headless mode not supported by FB anti-bot.
ReadinessScript only
URL Tracker
Tracked Sources — Crawl History
| Source | URLs Tracked | Scraped | Pending | Last Crawl | Next Crawl | Schedule |
|---|---|---|---|---|---|---|
| bam.co.th | ~25,000 pages | 25,000 | ~500 stale | 22 เม.ย. 02:52 | 29 เม.ย. | Weekly |
| kbank.com | ~15,000 pages | 15,000 | ~300 stale | 22 เม.ย. 02:52 | 29 เม.ย. | Weekly |
| home.co.th | 7,000 topic IDs | 6,760 | 0 new | 23 เม.ย. 15:31 | Manual | On-demand |
| ghbank.co.th | ~5,000 pages | 5,000 | ~100 stale | 22 เม.ย. 02:52 | 29 เม.ย. | Weekly |
| livinginsider.com | ~8,000 pages | 8,000 | ~200 stale | 22 เม.ย. 02:51 | 29 เม.ย. | Weekly |
| facebook.com | — | 0 | Pending | — | TBD | Manual |
AI Pipeline — Crawlab Custom
🕷️
1. Scrape
Track every URL
When scraped / What found
When scraped / What found
72,908 records
🧹
2. Clean
Remove duplicates
Normalize fields
Normalize fields
66,148 cleaned
🤖
3. AI Enrich
Category detection
Price extraction / Geocoding
Price extraction / Geocoding
In progress
📝
4. Summarize
AI-generated descriptions
SEO content / Highlights
SEO content / Highlights
5 articles done
🌐
5. Present
homes.kanawut.com
kanawut.com/projects
kanawut.com/projects
LIVE
Pipeline Progress
Overall completionStage 3/5 — AI Enrichment
Schedule Manager
NPA Banks (11)
FrequencyWeekly (Every Monday 02:00)
Stale After7 days
Last Run22 เม.ย. 02:46-02:52
Next Run28 เม.ย. 02:00
Command
POST /api/crawl/allhome.co.th
FrequencyOn-demand (manual)
Stale After30 days
Last Run23 เม.ย. (COMPLETE)
Next RunManual trigger
Command
node scanner.js --scrapeAI Processing
Clean + ImportAfter each crawl
AI EnrichBatch — 100/run
SEO Generate5 done / 6,760 total
PriorityNew listings first
Model
Claude / GPT-4oQuick Actions → Open Data Explorer
🔄 Re-run All NPA Crawlers
POST /api/crawl/all
🏠 Re-scan home.co.th
node home-coth-url-scanner.js --scrape
📊 Import to Neon PostgreSQL
python import-npa.py
🤖 Run AI Enrichment
batch process 100 records
📝 Generate SEO Articles
python generate-seo-articles.py
🔍 Check Data Quality
audit scripts
System
Infrastructure
DB EngineNeon PostgreSQL (serverless)
Raw StoreSQLite per scraper (~460MB)
FrontendNext.js — homes.kanawut.com
Scraper Hostcrawler_server.py :5002
Data Health
NPA Records66,148
home.co.th6,760
Stale Threshold7 days
DedupeActive (import-npa.py)
Pipeline Scripts
NPA Import
import-npa.pyhome.co.th
home-coth-url-scanner.jsFacebook
fb-scraper.jsCrawl API
crawler_server.py