Advanced scheduling for PostgreSQL

栏目: IT技术 · 发布时间: 4年前

内容简介:pg_timetableis an advanced job scheduler for PostgreSQL, offering many advantages over traditional schedulers such asThere are currently two options on how you can install and run pg_timetable.When using Docker, simply replace all

pg_timetable: Advanced scheduling for PostgreSQL

pg_timetableis an advanced job scheduler for PostgreSQL, offering many advantages over traditional schedulers such as cron and others. It is completely database driven and provides a couple of advanced concepts.

Table of Contents

    • 2.1. Container installation
    • 2.2. Local installation
  • 3. Features and advanced functionality
  • 4. Database logging and transactions
  • 5. Runtime information

1. Main features

  • Tasks can be arranged in chains
  • A chain can consist of SQL and executables
  • Parameters can be passed to chains
  • Missed tasks (possibly due to downtime) can be retried automatically
  • Support for configurable repetitions
  • Builtin tasks such as sending emails, etc.
  • Fully database driven configuration
  • Full support for database driven logging
  • Cron-style scheduling
  • Optional concurrency protection

2. Installation

There are currently two options on how you can install and run pg_timetable.

If you feel the need for a .deb or .rpm package, please let us know by submitting an issue, or - which we would really appreciate! - creating a pull request that does said things.

2.1 Container installation

When using Docker, simply replace all podman occurrences with docker .

  1. Get the Dockerfile:
wget -O pg_timetable.Dockerfile https://raw.githubusercontent.com/cybertec-postgresql/pg_timetable/master/Dockerfile
  1. Build the Docker image:
podman build -f pg_timetable.Dockerfile -t pg_timetable:latest
  1. Run the image:
podman run --rm pg_timetable:latest
  1. To pass additional arguments to pg_timetable, such as where your database is located, simply attach the flags to the podman run , like so:
podman run --rm pg_timetable:latest -h 10.0.0.3 -p 54321

2.2 Local Installation

  1. Downlod and install Go on your system.
  2. Clone pg_timetable using go get :
$ env GIT_TERMINAL_PROMPT=1 go get github.com/cybertec-postgresql/pg_timetable/
Username for 'https://github.com': <Github Username>
Password for 'https://cyberboy@github.com': <Github Password>
  1. Run pg_timetable :
$ cd ~/go/src/github.com/cybertec-postgresql/pg_timetable/
$ go run main.go --dbname=dbname --name=worker001 --user=scheduler --password=strongpwd

Alternatively, build a binary and run it:

$ go build
$ ./pg_timetable --dbname=dbname --name=worker001 --user=scheduler --password=strongpwd
  1. (Optional) Run tests in all sub-folders of the project:
$ cd ~/go/src/github.com/cybertec-postgresql/pg_timetable/
$ go get github.com/stretchr/testify/
$ go test ./...

Alternatively, run tests using postgres docker image:

$ RUN_DOCKER=true go test ./...

3. Features and advanced functionality

The scheduling in pg_timetable encompasses three different stages to facilitate the reuse with other parameters or additional schedules.

The first stage, base_task , defines what to do.

The second stage, task_chain , contains a list of base tasks to run sequentially.

The third stage consists of the chain_execution_config and defines if , when , and how often a chain should be executed.

Additionally, to provide the base tasks with parameters and influence their behavior, each entry in a task chain can be accompanied by an execution parameter .

3.1. Base task

In pg_timetable , the most basic building block is a base task . Currently, there are three different kinds of task:

Base task kind Task kind type Example
SQL snippet SQL Starting a cleanup, refreshing a materialized view or processing data.
External program SHELL Anything that can be called from the command line.
Internal Task BUILTIN A prebuilt functionality included in pg_timetable . These include:
  • Sleep
  • Log
  • SendMail
  • Download

A new base task can be created by inserting a new entry into timetable.base_task .

Excerpt of timetable.base_task

Column Type Definition
name text The name of the base task.
kind timetable.task_kind The type of the base task. Can be SQL (default), SHELL or BUILTIN .
script text Contains either a SQL script or a command string which will be executed.

3.2. Task chain

The next building block is a chain , which simply represents a list of tasks. An example would be:

  • Download files from a server
  • Import files
  • Run aggregations
  • Commit the transaction
  • Remove the files from disk

All tasks of the chain in pg_timetable are executed within one transaction. However, please, pay attention there is no opportunity to rollback SHELL and BUILTIN tasks.

Excerpt of timetable.task_chain

Column Type Definition
parent_id bigint The ID of the previous base task in the chain. Set this to NULL if it is the first base task in the chain.
task_id bigint The ID of the base task .
run_uid text The role as which the chain should be executed as.
database_connection integer The ID of the timetable.database_connection that should be used.
ignore_error boolean Specify if the chain should resume after encountering an error (default: true ).

3.2.1. Chain execution configuration

Once a chain has been created, it has to be scheduled. For this, pg_timetable builds upon the standard cron -string, all the while adding multiple configuration options.

Excerpt of timetable.chain_execution_config

Column Type Definition
chain_id bigint The id of the task chain .
chain_name text The name of the chain .
run_at_minute integer To achieve the cron equivalent of * , set the value to NULL.
run_at_hour integer
run_at_day integer
run_at_month integer
run_at_day_of_week integer
max_instances integer The amount of instances that this chain may have running at the same time.
live boolean Control if the chain may be executed once it reaches its schedule.
self_destruct boolean Self destruct the chain.
exclusive_execution boolean Specifies whether the chain should be executed exclusively while all other chains are paused.
excluded_execution_configs integer[] TODO
client_name text Specifies which client should execute the chain. Set this to `NULL` to allow any client.

3.2.2. Chain execution parameters

As mentioned above, base tasks are simple skeletons (e.g. send email , vacuum , etc.). In most cases, they have to be brought to live by passing parameters to the execution.

Excerpt of timetable.chain_execution_paramaters

Column Type Definition
chain_execution_config bigint The ID of the chain execution configuration.
chain_id bigint The ID of the chain.
order_id integer The order of the parameter.
value jsonb A string JSON array containing the paramaters.

3.3. Example usages

A variety of examples can be found in the /samples directory.

3.4 Examle functions

Create a Job with the timetable.job_add function. With this function you can add a new Job with a specific time ( by_minute , by_hour , by_day , by_month , by_day_of_week ) as comma separated text list to run or with a in a cron-syntax.

Parameter Type Definition Default
task_name text The name of the Task
task_function text The function wich will be executed.
task_type text Type of the function SQL , SHELL and BUILTIN SQL
by_cron text Time Schedule in Cron Syntax
by_minute text This specifies the minutes on which the job is to run ALL
by_hour text This specifies the hours on which the job is to run ALL
by_day text This specifies the days on which the job is to run. ALL
by_month text This specifies the month on which the job is to run ALL
by_day_of_week text This specifies the day of week (0,7 is sunday) on which the job is to run ALL
max_instances integer The amount of instances that this chain may have running at the same time. NULL
live boolean Control if the chain may be executed once it reaches its schedule. FALSE
self_destruct boolean Self destruct the chain. FALSE

If the parameter by_cron is used all other by_* ( by_minute , by_hour , by_day , by_month , by_day_of_week ) will be ignored.

3.4.1 Usage

3.4.1.1 With Cron-Style

Run "MyJob" at 00:05 in August. SELECT timetable.job_add('MyJob','Select public.my_func()',null,'SQL','5 0 * 8 *');

Run "MyJob" at minute 23 past every 2nd hour from 0 through 20. SELECT timetable.job_add('MyJob','Select public.my_func()',null,'SQL','23 0-20/2 * * *');

3.4.1.2 With specific time

Run "SQL" at 01:00 on first day of Month

SELECT timetable.job_add ('At minute 0 and 1st hour on first day of Month',
    'SELECT timetable.insert_dummy_log()',
    null,
    'SQL',
    null,
    '0',
    '1',
    '1',
    null,
    null,
    '1',
    TRUE,
    FALSE);

Run "SQL" at 01:00 and 02:00 on every Monday´s

SELECT timetable.job_add ('at 01:00 and 02:00 on every Monday´s',
   'SELECT timetable.insert_dummy_log()',
   null,
   'SQL',
   null,
   '0',
   null,
   '1,2',
   null,
   '1',
   '1',
   TRUE,
   FALSE);

4. Database logging and transactions

The entire activity of pg_timetable is logged in database tables ( timetable.log and timetable.execution_log ). Since there is no need to parse files when accessing log data, the representation through an UI can be easily achieved.

Furthermore, this behavior allows a remote host to access the log in a straightforward manner, simplifying large and/or distributed applications.

Note: Logs are written in a separate transaction, in case the chain fails.

5. Runtime information

In order to examine the activity of pg_timetable , the table timetable.run_status can be queried. It contains information about active jobs and their current parameters.

6. Schema diagram

Advanced scheduling for PostgreSQL

7. Contributing

If you want to contribute to pg_timetable and help make it better, feel free to open an issue or even consider submitting a pull request.

8. Support

For professional support, please contact Cybertec .

9. Authors

Pavlo Golub and Hans-Jürgen Schönig .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

突破之道

突破之道

(美)基思 R. 麦克法兰(Keith R. McFarland) / 江南、江维 / 机械工业出版社 / 2017-4 / 49.00

《从优秀到卓越》前传。 深入调查7000多家公司、1500多名关键管理人员,总结提炼出6大突破式发展策略。 解析创业成功后的公司如何腾飞,以几何速度增长,突破10亿关口。 阐述为什么创始人自己才是企业实现突破的最大障碍。一起来看看 《突破之道》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码