[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[taler-grid5k] 108/189: update explain.py - multiline params not working
From: |
gnunet |
Subject: |
[taler-grid5k] 108/189: update explain.py - multiline params not working yet - needs some style improvements |
Date: |
Thu, 28 Apr 2022 10:47:58 +0200 |
This is an automated email from the git hooks/post-receive script.
marco-boss pushed a commit to branch master
in repository grid5k.
commit 7a10fb3b530b82ca2846b94e670616d06ad5c347
Author: Boss Marco <bossm8@bfh.ch>
AuthorDate: Wed Mar 23 22:22:46 2022 +0100
update explain.py - multiline params not working yet - needs some style
improvements
---
additional/explain-visualizer/example.log | 25 -----
additional/explain-visualizer/explain.py | 156 ++++++++++++++++++++++++++----
experiment/scripts/database.sh | 45 ++++++---
3 files changed, 169 insertions(+), 57 deletions(-)
diff --git a/additional/explain-visualizer/example.log
b/additional/explain-visualizer/example.log
deleted file mode 100644
index 653c6a8..0000000
--- a/additional/explain-visualizer/example.log
+++ /dev/null
@@ -1,25 +0,0 @@
-2022-03-22 18:43:44.104 UTC [89] LOG: statement: EXPLAIN (ANALYZE, VERBOSE,
BUFFERS, FORMAT JSON) Select * from reserves join reserves_in using
(reserve_pub) where reserve_pub='x';
-2022-03-22 18:44:40.956 UTC [92] LOG: statement: select * from reserves where
reserve_pub='x';
-2022-03-22 18:53:19.795 UTC [92] LOG: statement: select * from reserves where
reserve_pub='x';
-2022-03-22 18:53:20.577 UTC [92] LOG: statement: select * from reserves where
reserve_pub='x';
-2022-03-22 18:53:21.135 UTC [92] LOG: statement: select * from reserves where
reserve_pub='x';
-2022-03-22 18:53:21.646 UTC [92] LOG: statement: select * from reserves where
reserve_pub='x';
-2022-03-22 18:53:31.910 UTC [92] ERROR: syntax error at or near "where" at
character 42
-2022-03-22 18:53:31.910 UTC [92] STATEMENT: select * from reserves join
reserves_out where reserve_pub='x';
-2022-03-22 18:53:35.640 UTC [92] ERROR: syntax error at or near "where" at
character 41
-2022-03-22 18:53:35.640 UTC [92] STATEMENT: select * from reserves join
reserves_in where reserve_pub='x';
-2022-03-22 18:53:43.136 UTC [92] ERROR: syntax error at or near "reserve_pub"
at character 61
-2022-03-22 18:53:43.136 UTC [92] STATEMENT: select * from reserves join
reserves_in on reserve_pubwhere reserve_pub='x';
-2022-03-22 18:53:45.680 UTC [92] LOG: statement: select * from reserves join
reserves_in on reserve_pub where reserve_pub='x';
-2022-03-22 18:53:45.681 UTC [92] ERROR: column reference "reserve_pub" is
ambiguous at character 44
-2022-03-22 18:53:45.681 UTC [92] STATEMENT: select * from reserves join
reserves_in on reserve_pub where reserve_pub='x';
-2022-03-22 18:53:51.504 UTC [92] LOG: statement: select * from reserves join
reserves_in on(reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:51.504 UTC [92] ERROR: column reference "reserve_pub" is
ambiguous at character 44
-2022-03-22 18:53:51.504 UTC [92] STATEMENT: select * from reserves join
reserves_in on(reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:55.042 UTC [92] LOG: statement: select * from reserves join
reserves_in on (reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:55.042 UTC [92] ERROR: column reference "reserve_pub" is
ambiguous at character 45
-2022-03-22 18:53:55.042 UTC [92] STATEMENT: select * from reserves join
reserves_in on (reserve_pub) where reserve_pub='x';
-2022-03-22 18:54:01.883 UTC [92] LOG: statement: select * from reserves join
reserves_in on (reserve_pub) where reserves.reserve_pub='x';
-2022-03-22 18:54:01.883 UTC [92] ERROR: column reference "reserve_pub" is
ambiguous at character 45
-2022-03-22 18:54:01.883 UTC [92] STATEMENT: select * from reserves join
reserves_in on (reserve_pub) where reserves.reserve_pub='x';
-2022-03-22 18:54:06.817 UTC [92] LOG: statement: select * from reserves join
reserves_in using (reserve_pub) where reserves.reserve_pub='x';
diff --git a/additional/explain-visualizer/explain.py
b/additional/explain-visualizer/explain.py
index b4a7aaa..3d91640 100755
--- a/additional/explain-visualizer/explain.py
+++ b/additional/explain-visualizer/explain.py
@@ -1,16 +1,23 @@
#!/usr/bin/env python3
from json import loads
-from re import compile
+from re import compile, sub, escape
+import stat
from subprocess import check_output, CalledProcessError
from enum import Enum
from textwrap import fill
import argparse
+try:
+ import sqlparse
+except ImportError:
+ print("NOTE: install sqlparse to get prettier sql statements")
+
class Type(Enum):
RELATION = 0
- OTHER = 1
+ MODIFY = 1
+ OTHER = 2
class Plan:
@@ -25,11 +32,16 @@ class Plan:
self.used_index = ""
self.condition = ""
self.type = _type
+ self.operation = ""
+ self.tuples_inserted = 0
+ self.tuples_conflicting = 0
self.indent = " " * indent
def print(self):
if self.type == Type.RELATION:
self._print_relation()
+ elif self.type == Type.MODIFY:
+ self._print_modify()
elif self.type == Type.OTHER:
self._print_other()
@@ -52,20 +64,36 @@ class Plan:
print(self.indent + f"{self.name}:")
print("")
+ def _print_modify(self):
+ print(self.indent + f"{self.name}:\n")
+ print(self.indent + f" Operation: {self.operation}")
+ print(self.indent + f" Tuples Inserted: {self.tuples_inserted}")
+ if self.tuples_conflicting != 0:
+ print(self.indent + f" Tuples Confliting:
{self.tuples_conflicting}")
+
plans = {}
most_time_consuming = Plan("default", Type.RELATION, 0)
partitioned_re = compile('.*_([0-9]+|default)')
-log_statement_re = compile('.*LOG: statement: (.*);')
+
+log_statement_re = None
+log_execute_re = None
+log_get_params_re = None
args = None
def get_explain(sql) -> dict:
print("\n" + "=" * 80 + "\n")
- print(fill(text=sql, width=80))
+ if sqlparse:
+ print(sqlparse.format(sql,
+ reindent=True,
+ keyword_case='upper',
+ wrap_after=80))
+ else:
+ print(fill(text=sql, width=80))
print("\n" + "=" * 80 + "\n")
- sql = sql.replace("\n", "").replace('"', "")
+ sql = sql.replace("\n", " ").replace('"', "")
sql = "EXPLAIN (ANALYZE, VERBOSE, BUFFERS, FORMAT JSON) " + sql + ";"
try:
analyze_json = check_output(
@@ -82,7 +110,14 @@ def get_explain(sql) -> dict:
return loads(analyze_json)[0]
except CalledProcessError as e:
print(e.output)
- print("\nQuery returned an error, skipping analyzation\n")
+ print("\nQuery returned an error, skipping analysation\n")
+
+
+def setup_log_regex():
+ global log_statement_re, log_execute_re, log_get_params_re
+ log_statement_re = compile(f'.*{args.log_id} LOG: statement:
((?!COMMIT)(?!BEGIN)(?!SET)(?!START).*);')
+ log_execute_re = compile(f'.*\[(\d+)-(\d+)\].*\[(\d+)\] {args.log_id} LOG:
execute [a-zA-Z0-9_]+: ((?!COMMIT)(?!BEGIN)(?!SET)(?!START).*)')
+ log_get_params_re = compile('(\$\d+)+ = ([^, ]+)')
def ensure_partition_added(relation, indent):
@@ -95,6 +130,17 @@ def ensure_plan_added(plan, indent):
plans[idx] = Plan(plan, Type.OTHER, indent)
+def ensure_modify_added(plan, indent):
+ idx = plan['Relation Name']
+ p = Plan(plan['Relation Name'], Type.MODIFY, indent)
+ p.operation = plan['Operation']
+ p.time_spent = plan['Actual Total Time']
+ if 'Tuples Inserted' in plan:
+ p.tuples_inserted = plan['Tuples Inserted']
+ p.tuples_conflicting = plan['Conflicting Tuples']
+ plans[idx] = p
+
+
def check_most_expensive_time(plan: Plan):
global most_time_consuming
if most_time_consuming.time_spent < plan.time_spent:
@@ -133,12 +179,20 @@ def iterate_plans():
def parse_plan(plan, indent=2):
- if 'Plans' in plan:
+ if plan['Node Type'] == 'Function Scan':
+ ensure_plan_added(plan['Node Type'], indent)
+ elif 'Scan' in plan['Node Type'] and 'CTE' not in plan['Node Type']:
+ parse_scan(plan, indent + 2)
+ elif plan['Node Type'] == 'ModifyTable':
+ if plan['Operation'] == 'Insert':
+ ensure_modify_added(plan, indent)
+ elif plan['Operation'] == 'Update':
+ for _plan in plan['Plans']:
+ parse_plan(_plan, indent+2)
+ elif 'Plans' in plan:
ensure_plan_added(plan['Node Type'], indent)
for _plan in plan['Plans']:
parse_plan(_plan, indent + 2)
- else:
- parse_scan(plan, indent + 2)
def print_summary(attr_name):
@@ -200,23 +254,86 @@ def analyze_from_query():
visualize(explain)
-def handle_log_line(line):
+def handle_log_statement(sql):
# reset global plans
global plans, most_time_consuming
plans = {}
most_time_consuming = Plan("default", Type.RELATION, 0)
- match = log_statement_re.match(line)
- if match:
- explain = get_explain(match.group(1))
- visualize(explain)
+ explain = get_explain(sql)
+ visualize(explain)
def analyze_from_logs():
- with open(args.file, 'r') as logs:
- for line in logs:
- handle_log_line(line)
+ logs = None
+
+ setup_log_regex()
+
+ try:
+ logs = open(args.file, 'r')
+ except IOError as e:
+ print("Error: cannot open file {args.file}")
+ print(e.strerror)
+ exit(1)
+
+ lines = logs.readlines()
+ n_lines = len(lines)
+
+ for idx, line in enumerate(lines):
+
+ statement = None
+ params = None
+
+ if match := log_statement_re.match(line.strip()):
+ statement = match.group(1)
+
+ elif match := log_execute_re.match(line.strip()):
+
+ id = match.group(1)
+ n = int(match.group(2)) + 1
+
+ pid = match.group(3)
+ statement = match.group(4)
+
+ # optimistically assume that the two line statement's second
+ # line is in between the next then lines
+ match = None
+ if statement[-1] != ';':
+ idx_curr = idx + 1
+ multi_line_re = compile(f'.*\[{id}-{n}] (.*)')
+ while idx_curr < idx + 10:
+ if match := multi_line_re.match(lines[idx_curr]):
+ break
+ idx_curr += 1
+
+ if match is not None:
+ n += 1
+ statement = statement + " " + match.group(1)
+
+ if '$' in statement:
+ params_re = compile(f'.*\[{id}-{n}].*\[{pid}\] {args.log_id}
DETAIL: parameters: (\$\d+ = .*)')
+
+ idx_curr = idx + 1
+ match = None
+ while idx_curr < n_lines:
+ if match := params_re.match(lines[idx_curr]):
+ break
+ idx_curr += 1
+
+ if match is None:
+ print(f"ERROR: Could not find params for
query\n\n{statement}\n")
+ continue
+
+ params = log_get_params_re.findall(match.group(1))
+ for param in params:
+ statement = sub(escape(param[0]) + r'\b',
escape(param[1]), statement)
+
+ if statement is not None:
+ handle_log_statement(statement)
+ statement = None
+
+ logs.close()
def main():
@@ -258,6 +375,11 @@ def main():
log_p = s_p.add_parser('logs',
parents=[global_args],
help='Execute queries from a log file (postgres
must have set log_statement=all)')
+ log_p.add_argument('-l', '--log-identifier',
+ dest='log_id',
+ type=str,
+ default="taler@taler-exchange",
+ help='Identifier (user@db_name) to filter which
statements from the logs to analyze (taler@taler-exchange)')
log_p.add_argument(metavar='FILE',
dest='file',
type=str,
diff --git a/experiment/scripts/database.sh b/experiment/scripts/database.sh
index 6e10fb2..1e0d4da 100755
--- a/experiment/scripts/database.sh
+++ b/experiment/scripts/database.sh
@@ -30,12 +30,12 @@ function setup_disks() {
function setup_ram_storage() {
SIZE=$(($(awk '/MemTotal/ {print $2}' /proc/meminfo) / 10))
- if ! df | grep -q /var/lib/postgresql; then
- mv /var/lib/postgresql /var/lib/postgresql.bak
- mkdir /var/lib/postgresql
- chown postgres:postgres /var/lib/postgresql
- mount -t tmpfs -o size=${SIZE}k pgdata /var/lib/postgresql
- cp -rp /var/lib/postgresql.bak/* /var/lib/postgresql
+ if ! df | grep -q /tmp/postgresql; then
+ mv /tmp/postgresql /tmp/postgresql.bak
+ mkdir /tmp/postgresql
+ chown postgres:postgres /tmp/postgresql
+ mount -t tmpfs -o size=${SIZE}k pgdata /tmp/postgresql
+ cp -rp /tmp/postgresql.bak/* /tmp/postgresql
fi
}
@@ -72,11 +72,25 @@ function setup_config() {
# in the main configuration
echo "
listen_addresses='*'
+
log_destination=syslog
- log_error_verbosity=terse
syslog_ident='taler-database'
+ log_error_verbosity=terse
+
+ # for pgbadger
+ # log_checkpoints=on
+ # log_connections=on
+ # log_disconnections=on
+ # log_lock_waits=on
+ # log_temp_files=0
+ # log_autovacuum_min_duration=0
+ # log_error_verbosity=default
+ # log_duration=on
+
+ # log_statement=all
+
log_min_duration_statement=${DB_MIN_DURATION_STATEMENT}
- shared_preload_libraries='pg_stat_statements,auto_explain'
+
auto_explain.log_min_duration='${DB_MIN_DURATION_STATEMENT}ms'
auto_explain.log_verbose=true
auto_explain.log_nested_statements=true
@@ -84,8 +98,10 @@ function setup_config() {
auto_explain.log_buffers=true
auto_explain.log_wal=true
+ shared_preload_libraries='pg_stat_statements,auto_explain'
+
# Should be set locally
- join_collapse_limit=1
+ # join_collapse_limit=1
# Large tables perform bad with the default settings
# However, they could also be set on each table indiviudally
@@ -111,12 +127,11 @@ function setup_config() {
#
(https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-MAX-WAL-SIZE)
min_wal_size=4GB
- max_wal_size=8GB
- wal_buffers=16MB
+ max_wal_size=100GB
+ wal_buffers=1GB
- # Hopefully reduce disk IO
- # https://www.postgresql.org/docs/12/wal-configuration.html
checkpoint_completion_target=0.9
+ checkpoint_timeout = 15min
random_page_cost=1.1
# Default 2kB
@@ -132,7 +147,7 @@ function setup_config() {
# Bad when turned off - Recovering db may not be possible
# https://www.postgresql.org/docs/13/runtime-config-wal.html#GUC-FSYNC
- # fsync = off
+ fsync = off
# Not so bad as when turning off fsync, but single transactions might get
lost on crash - but
# like they would have aborted cleanly
@@ -159,7 +174,7 @@ function setup_config() {
max_parallel_workers_per_gather=10
max_connections=500
- max_parallel_maintenance_workers=4
+ max_parallel_maintenance_workers=12
# out of shared memory
max_locks_per_transaction=85
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
- [taler-grid5k] 77/189: test custom metrics, (continued)
- [taler-grid5k] 77/189: test custom metrics, gnunet, 2022/04/28
- [taler-grid5k] 116/189: fix verbose output, gnunet, 2022/04/28
- [taler-grid5k] 124/189: enable admin api for prometheus in order to test snapshots, gnunet, 2022/04/28
- [taler-grid5k] 75/189: also separate wallet from the main syslog receiver, gnunet, 2022/04/28
- [taler-grid5k] 80/189: change denominations, gnunet, 2022/04/28
- [taler-grid5k] 62/189: fix lookahead sign, gnunet, 2022/04/28
- [taler-grid5k] 84/189: cleaner exchange setup, gnunet, 2022/04/28
- [taler-grid5k] 118/189: add sharding logic, gnunet, 2022/04/28
- [taler-grid5k] 105/189: work towards parsing logs, gnunet, 2022/04/28
- [taler-grid5k] 79/189: add histograms, gnunet, 2022/04/28
- [taler-grid5k] 108/189: update explain.py - multiline params not working yet - needs some style improvements,
gnunet <=
- [taler-grid5k] 99/189: add explain analyzer, gnunet, 2022/04/28
- [taler-grid5k] 88/189: fix build, gnunet, 2022/04/28
- [taler-grid5k] 95/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 109/189: update promtail message size, gnunet, 2022/04/28
- [taler-grid5k] 117/189: some db config, gnunet, 2022/04/28
- [taler-grid5k] 64/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 96/189: fixes, add lookahead sign as param, gnunet, 2022/04/28
- [taler-grid5k] 70/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 120/189: update postgres exporter to work when there is no merchant, update grafana dashboards, gnunet, 2022/04/28
- [taler-grid5k] 74/189: reduce amount of logging wallets, gnunet, 2022/04/28