MySQL to ClickHouse (Insert)
Transfer millions of rows from MySQL to ClickHouse.
Example Job
YAML example:
Python example:
Parameters
Parameter | Default |
---|---|
live_schema | False |
primary_id | None |
extract_offset | 0 |
extract_limit | 0 |
extract_chunk_size | 1000 |
truncate_table | False |
load_chunk_size | 1000 |
sleep_interval | 1 (in second) |
live_schema
False
means take table’s schema from database. False
means take schema from local (databases/migrations/mysql/users.py
) file.
If your live database table has many columns but you want to transfer only few columns to warehouse then you can use False
and define only required columns in local file.
primary_id
Primary id column will be used for offsetting the extract query.
extract_offset
If we want to extract data those are greater than id, then we can use extract_offset
.
extract_limit
To reduce the pressure on source database, we can use extract_limit
.
if we set extract_limit=500
then it will extract only 500 rows and stop the job.
This is useful when we want to transfer a small amount of data or test the job with a few rows.
extract_chunk_size
Extract chunk size is used to extract data in chunks. This is useful when we have millions of rows in the source table. It works similar to extract_limit
but it will extract data in chunks.
We cannot use extract_limit
and extract_chunk_size
together. If we set extract_limit
then extract_chunk_size
will be ignored.
truncate_table
If we want to truncate the table before loading to the target, we can use truncate_table=True
.
load_chunk_size
Load chunk size is used to load data in chunks. This is useful when we have millions of rows in the source table. It works similar to extract_chunk_size
but it will load data in chunks.
sleep_interval
Sleep interval is used to pause the job for a few seconds. This is useful when we have a large number of rows in the source table and we want to reduce the pressure on the source and target databases.