tidbstandalone/TROUBLESHOOTING.md

420 lines
7.6 KiB
Markdown

# Troubleshooting Guide
## Common Issues and Solutions
### 1. TiDB Health Check Failing
#### Symptom
```
dependency failed to start: container tidb is unhealthy
```
Even though you can connect to TiDB from your host machine, Docker health check fails.
#### Root Cause
The original health check tried to use `mysql` command inside the TiDB container:
```yaml
healthcheck:
test: ["CMD", "mysql", "-h", "127.0.0.1", "-P", "4000", "-u", "root", "-e", "SELECT 1"]
```
The TiDB Docker image doesn't include the MySQL client binary, so this check always failed.
#### Solution ✅
Use TiDB's built-in HTTP status endpoint instead:
```yaml
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://127.0.0.1:10080/status"]
interval: 10s
timeout: 5s
retries: 3
start_period: 10s
```
**Why this works:**
- TiDB exposes a status endpoint on port 10080
- `wget` is available in the container
- Returns HTTP 200 when TiDB is ready
- `start_period` gives TiDB time to initialize before health checks begin
### 2. Docker Compose Version Warning
#### Symptom
```
WARN[0000] version is obsolete, it will be ignored
```
#### Solution ✅
Remove the `version` field from `docker-compose.yml`. Modern Docker Compose (v2) doesn't need it.
**Before:**
```yaml
version: '3.8'
services:
...
```
**After:**
```yaml
services:
...
```
### 3. Service Dependencies Not Starting in Order
#### Symptom
Services fail because dependencies aren't ready yet.
#### Solution ✅
Use proper health checks and dependency conditions:
```yaml
dm-worker:
depends_on:
tidb:
condition: service_healthy
dm-master:
condition: service_healthy
```
**Important:**
- Each dependency must have a working health check
- `start_period` prevents false negatives during startup
### 4. dm-init Fails to Start
#### Symptom
```
Error: dm-init exits immediately
```
#### Check:
```bash
docker logs dm-init
```
#### Common Causes:
**a) .env not configured:**
```bash
# Check if .env exists and has real values
cat .env
```
**Solution:**
```bash
# Copy template and edit
cp .env.example .env
vim .env
```
**b) Test database not reachable:**
```bash
# Test from dm-init container
docker run --rm --network tidb-network pingcap/dm:latest \
sh -c "wget -q -O- http://tidb:10080/status"
```
**c) Script syntax error:**
```bash
# Check init script
sh -n scripts/init-dm.sh
```
### 5. Containers Keep Restarting
#### Check Status:
```bash
docker ps -a
docker logs <container_name>
```
#### Common Issues:
**a) Port already in use:**
```
Error: bind: address already in use
```
**Solution:** Change ports in `docker-compose.yml`:
```yaml
ports:
- "14000:4000" # Changed from 4000:4000
```
**b) Out of memory:**
```
Error: OOM killed
```
**Solution:** Increase memory limits or free up system resources.
**c) Permission issues:**
```
Error: permission denied
```
**Solution:** Check volume permissions or run:
```bash
docker compose down -v # Remove volumes
docker compose up -d # Recreate
```
### 6. Sync Task Not Running
#### Check Status:
```bash
./status.sh
# or
./sync-control.sh status
```
#### Common Issues:
**a) Task not created:**
```bash
# Check if source is configured
docker exec dm-master /dmctl --master-addr=dm-master:8261 operate-source show
```
**Solution:**
```bash
./sync-control.sh reinit
```
**b) Wrong credentials:**
Check logs:
```bash
docker logs dm-worker
```
Fix `.env` and reinit:
```bash
vim .env
./sync-control.sh reinit
```
**c) Table doesn't exist:**
Verify tables exist on source database:
```bash
# Connect to test DB and check
SHOW TABLES FROM your_database;
```
### 7. Connection Refused to TiDB
#### Symptom
```
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (61)
```
#### Checks:
**a) Is TiDB running?**
```bash
docker ps | grep tidb
```
**b) Is it healthy?**
```bash
docker ps
# Look for "(healthy)" status
```
**c) Is port exposed?**
```bash
docker port tidb
# Should show: 4000/tcp -> 0.0.0.0:4000
```
**d) Test from inside container:**
```bash
docker exec tidb wget -q -O- http://127.0.0.1:10080/status
```
#### Solutions:
**If container not running:**
```bash
docker compose up -d tidb
docker logs tidb
```
**If unhealthy:**
```bash
# Wait for health check
sleep 15
docker ps
# If still unhealthy, check logs
docker logs tidb
```
**If port not exposed:**
```bash
# Recreate container
docker compose down
docker compose up -d
```
### 8. Data Not Syncing
#### Verify sync is running:
```bash
./sync-control.sh status
```
#### Check sync lag:
Look for "syncer" section in status output.
#### Common Issues:
**a) Sync paused:**
```bash
./sync-control.sh resume
```
**b) Sync stopped with error:**
```bash
# Check error in status output
./sync-control.sh status
# Fix the issue, then restart
./sync-control.sh restart
```
**c) Network issues:**
```bash
# Test connectivity from dm-worker to source
docker exec dm-worker ping -c 3 your-test-db-host
```
**d) Binlog not enabled on source:**
Source database must have binlog enabled for incremental sync.
### 9. Slow Sync Performance
#### Check resource usage:
```bash
docker stats
```
#### Solutions:
**a) Increase worker resources:**
Edit `docker-compose.yml`:
```yaml
dm-worker:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
```
**b) Optimize batch size:**
See [TiDB DM Documentation](https://docs.pingcap.com/tidb-data-migration/stable/tune-configuration) for advanced tuning.
### 10. Docker Compose v1 vs v2 Issues
#### Symptom
```
docker: 'compose' is not a docker command
```
#### Solution
See [DOCKER_COMPOSE_V2.md](DOCKER_COMPOSE_V2.md) for:
- Upgrading to v2
- Creating an alias
- Compatibility mode
## Diagnostic Commands
### Check everything at once:
```bash
# Service status
docker compose ps
# Health checks
docker ps
# Logs (all services)
docker compose logs --tail=50
# Logs (specific service)
docker compose logs --tail=50 tidb
# Resource usage
docker stats --no-stream
# Network connectivity
docker network inspect tidb-network
```
### Test connectivity:
```bash
# From host to TiDB
mysql -h 127.0.0.1 -P 4000 -u root -e "SELECT 1"
# From host (HTTP)
curl http://127.0.0.1:10080/status
# From container to TiDB
docker run --rm --network tidb-network pingcap/dm:latest \
wget -q -O- http://tidb:10080/status
```
### Reset everything:
```bash
# Stop and remove everything (including data)
docker compose down -v
# Start fresh
./start.sh
```
## Getting Help
### Collect diagnostic information:
```bash
# Create a diagnostic report
echo "=== Docker Version ===" > diagnostic.txt
docker --version >> diagnostic.txt
docker compose version >> diagnostic.txt
echo -e "\n=== Container Status ===" >> diagnostic.txt
docker ps -a >> diagnostic.txt
echo -e "\n=== TiDB Logs ===" >> diagnostic.txt
docker logs tidb --tail=50 >> diagnostic.txt 2>&1
echo -e "\n=== DM Worker Logs ===" >> diagnostic.txt
docker logs dm-worker --tail=50 >> diagnostic.txt 2>&1
echo -e "\n=== DM Init Logs ===" >> diagnostic.txt
docker logs dm-init >> diagnostic.txt 2>&1
echo -e "\n=== Network Info ===" >> diagnostic.txt
docker network inspect tidb-network >> diagnostic.txt
echo "Report saved to diagnostic.txt"
```
### Useful resources:
- [TiDB Documentation](https://docs.pingcap.com/tidb/stable)
- [TiDB DM Documentation](https://docs.pingcap.com/tidb-data-migration/stable)
- Project documentation:
- [README.md](README.md)
- [SYNC_GUIDE.md](SYNC_GUIDE.md)
- [DATAGRIP_SETUP.md](DATAGRIP_SETUP.md)
- [DOCKER_COMPOSE_V2.md](DOCKER_COMPOSE_V2.md)
## Still Having Issues?
If none of these solutions work:
1. Check logs: `docker compose logs`
2. Create diagnostic report (see above)
3. Check if it's a known issue in TiDB/DM GitHub issues
4. Verify your environment meets prerequisites (see [README.md](README.md))