[taler-grid5k] 108/189: update explain.py - multiline params not working

gnunet-svn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[taler-grid5k] 108/189: update explain.py - multiline params not working

From:	gnunet
Subject:	[taler-grid5k] 108/189: update explain.py - multiline params not working yet - needs some style improvements
Date:	Thu, 28 Apr 2022 10:47:58 +0200

This is an automated email from the git hooks/post-receive script.

marco-boss pushed a commit to branch master
in repository grid5k.

commit 7a10fb3b530b82ca2846b94e670616d06ad5c347
Author: Boss Marco <bossm8@bfh.ch>
AuthorDate: Wed Mar 23 22:22:46 2022 +0100

    update explain.py - multiline params not working yet - needs some style 
improvements
---
 additional/explain-visualizer/example.log |  25 -----
 additional/explain-visualizer/explain.py  | 156 ++++++++++++++++++++++++++----
 experiment/scripts/database.sh            |  45 ++++++---
 3 files changed, 169 insertions(+), 57 deletions(-)

diff --git a/additional/explain-visualizer/example.log 
b/additional/explain-visualizer/example.log
deleted file mode 100644
index 653c6a8..0000000
--- a/additional/explain-visualizer/example.log
+++ /dev/null
@@ -1,25 +0,0 @@
-2022-03-22 18:43:44.104 UTC [89] LOG:  statement: EXPLAIN (ANALYZE, VERBOSE, 
BUFFERS, FORMAT JSON) Select * from reserves join reserves_in using 
(reserve_pub) where reserve_pub='x';
-2022-03-22 18:44:40.956 UTC [92] LOG:  statement: select * from reserves where 
reserve_pub='x';
-2022-03-22 18:53:19.795 UTC [92] LOG:  statement: select * from reserves where 
reserve_pub='x';
-2022-03-22 18:53:20.577 UTC [92] LOG:  statement: select * from reserves where 
reserve_pub='x';
-2022-03-22 18:53:21.135 UTC [92] LOG:  statement: select * from reserves where 
reserve_pub='x';
-2022-03-22 18:53:21.646 UTC [92] LOG:  statement: select * from reserves where 
reserve_pub='x';
-2022-03-22 18:53:31.910 UTC [92] ERROR:  syntax error at or near "where" at 
character 42
-2022-03-22 18:53:31.910 UTC [92] STATEMENT:  select * from reserves join 
reserves_out where reserve_pub='x';
-2022-03-22 18:53:35.640 UTC [92] ERROR:  syntax error at or near "where" at 
character 41
-2022-03-22 18:53:35.640 UTC [92] STATEMENT:  select * from reserves join 
reserves_in where reserve_pub='x';
-2022-03-22 18:53:43.136 UTC [92] ERROR:  syntax error at or near "reserve_pub" 
at character 61
-2022-03-22 18:53:43.136 UTC [92] STATEMENT:  select * from reserves join 
reserves_in on reserve_pubwhere reserve_pub='x';
-2022-03-22 18:53:45.680 UTC [92] LOG:  statement: select * from reserves join 
reserves_in on reserve_pub where reserve_pub='x';
-2022-03-22 18:53:45.681 UTC [92] ERROR:  column reference "reserve_pub" is 
ambiguous at character 44
-2022-03-22 18:53:45.681 UTC [92] STATEMENT:  select * from reserves join 
reserves_in on reserve_pub where reserve_pub='x';
-2022-03-22 18:53:51.504 UTC [92] LOG:  statement: select * from reserves join 
reserves_in on(reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:51.504 UTC [92] ERROR:  column reference "reserve_pub" is 
ambiguous at character 44
-2022-03-22 18:53:51.504 UTC [92] STATEMENT:  select * from reserves join 
reserves_in on(reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:55.042 UTC [92] LOG:  statement: select * from reserves join 
reserves_in on (reserve_pub) where reserve_pub='x';
-2022-03-22 18:53:55.042 UTC [92] ERROR:  column reference "reserve_pub" is 
ambiguous at character 45
-2022-03-22 18:53:55.042 UTC [92] STATEMENT:  select * from reserves join 
reserves_in on (reserve_pub) where reserve_pub='x';
-2022-03-22 18:54:01.883 UTC [92] LOG:  statement: select * from reserves join 
reserves_in on (reserve_pub) where reserves.reserve_pub='x';
-2022-03-22 18:54:01.883 UTC [92] ERROR:  column reference "reserve_pub" is 
ambiguous at character 45
-2022-03-22 18:54:01.883 UTC [92] STATEMENT:  select * from reserves join 
reserves_in on (reserve_pub) where reserves.reserve_pub='x';
-2022-03-22 18:54:06.817 UTC [92] LOG:  statement: select * from reserves join 
reserves_in using (reserve_pub) where reserves.reserve_pub='x';
diff --git a/additional/explain-visualizer/explain.py 
b/additional/explain-visualizer/explain.py
index b4a7aaa..3d91640 100755
--- a/additional/explain-visualizer/explain.py
+++ b/additional/explain-visualizer/explain.py
@@ -1,16 +1,23 @@
 #!/usr/bin/env python3
 
 from json import loads
-from re import compile
+from re import compile, sub, escape
+import stat
 from subprocess import check_output, CalledProcessError
 from enum import Enum
 from textwrap import fill
 import argparse
 
+try:
+    import sqlparse
+except ImportError:
+    print("NOTE: install sqlparse to get prettier sql statements")
+
 
 class Type(Enum):
     RELATION = 0
-    OTHER = 1
+    MODIFY = 1
+    OTHER = 2
 
 
 class Plan:
@@ -25,11 +32,16 @@ class Plan:
         self.used_index = ""
         self.condition = ""
         self.type = _type
+        self.operation = ""
+        self.tuples_inserted = 0
+        self.tuples_conflicting = 0
         self.indent = " " * indent
 
     def print(self):
         if self.type == Type.RELATION:
             self._print_relation()
+        elif self.type == Type.MODIFY:
+            self._print_modify()
         elif self.type == Type.OTHER:
             self._print_other()
 
@@ -52,20 +64,36 @@ class Plan:
         print(self.indent + f"{self.name}:")
         print("")
 
+    def _print_modify(self):
+        print(self.indent + f"{self.name}:\n")
+        print(self.indent + f"  Operation:         {self.operation}")
+        print(self.indent + f"  Tuples Inserted:   {self.tuples_inserted}")
+        if self.tuples_conflicting != 0:
+          print(self.indent + f"  Tuples Confliting: 
{self.tuples_conflicting}")
+
 
 plans = {}
 most_time_consuming = Plan("default", Type.RELATION, 0)
 
 partitioned_re = compile('.*_([0-9]+|default)')
-log_statement_re = compile('.*LOG:  statement: (.*);')
+
+log_statement_re = None
+log_execute_re = None
+log_get_params_re = None
 
 args = None
 
 def get_explain(sql) -> dict:
     print("\n" + "=" * 80 + "\n")
-    print(fill(text=sql, width=80))
+    if sqlparse:
+        print(sqlparse.format(sql, 
+                              reindent=True, 
+                              keyword_case='upper', 
+                              wrap_after=80))
+    else:
+        print(fill(text=sql, width=80))
     print("\n" + "=" * 80 + "\n")
-    sql = sql.replace("\n", "").replace('"', "")
+    sql = sql.replace("\n", " ").replace('"', "")
     sql = "EXPLAIN (ANALYZE, VERBOSE, BUFFERS, FORMAT JSON) " + sql + ";"
     try:
         analyze_json = check_output(
@@ -82,7 +110,14 @@ def get_explain(sql) -> dict:
         return loads(analyze_json)[0]
     except CalledProcessError as e:
         print(e.output)
-        print("\nQuery returned an error, skipping analyzation\n")
+        print("\nQuery returned an error, skipping analysation\n")
+
+
+def setup_log_regex():
+    global log_statement_re, log_execute_re, log_get_params_re
+    log_statement_re = compile(f'.*{args.log_id} LOG:  statement: 
((?!COMMIT)(?!BEGIN)(?!SET)(?!START).*);')
+    log_execute_re = compile(f'.*\[(\d+)-(\d+)\].*\[(\d+)\] {args.log_id} LOG: 
 execute [a-zA-Z0-9_]+: ((?!COMMIT)(?!BEGIN)(?!SET)(?!START).*)')
+    log_get_params_re = compile('(\$\d+)+ = ([^, ]+)')
 
 
 def ensure_partition_added(relation, indent):
@@ -95,6 +130,17 @@ def ensure_plan_added(plan, indent):
     plans[idx] = Plan(plan, Type.OTHER, indent)
 
 
+def ensure_modify_added(plan, indent):
+    idx = plan['Relation Name']
+    p = Plan(plan['Relation Name'], Type.MODIFY, indent)
+    p.operation = plan['Operation']
+    p.time_spent = plan['Actual Total Time']
+    if 'Tuples Inserted' in plan:
+      p.tuples_inserted = plan['Tuples Inserted']
+      p.tuples_conflicting = plan['Conflicting Tuples']
+    plans[idx] = p
+
+
 def check_most_expensive_time(plan: Plan):
     global most_time_consuming
     if most_time_consuming.time_spent < plan.time_spent:
@@ -133,12 +179,20 @@ def iterate_plans():
 
 
 def parse_plan(plan, indent=2):
-    if 'Plans' in plan:
+    if plan['Node Type'] == 'Function Scan':
+        ensure_plan_added(plan['Node Type'], indent)
+    elif 'Scan' in plan['Node Type'] and 'CTE' not in plan['Node Type']:
+        parse_scan(plan, indent + 2)
+    elif plan['Node Type'] == 'ModifyTable':
+        if plan['Operation'] == 'Insert':
+            ensure_modify_added(plan, indent)
+        elif plan['Operation'] == 'Update':
+            for _plan in plan['Plans']:
+                parse_plan(_plan, indent+2)
+    elif 'Plans' in plan:
         ensure_plan_added(plan['Node Type'], indent)
         for _plan in plan['Plans']:
             parse_plan(_plan, indent + 2)
-    else:
-        parse_scan(plan, indent + 2)
 
 
 def print_summary(attr_name):
@@ -200,23 +254,86 @@ def analyze_from_query():
     visualize(explain)
 
 
-def handle_log_line(line):
+def handle_log_statement(sql):
 
     # reset global plans
     global plans, most_time_consuming
     plans = {}
     most_time_consuming = Plan("default", Type.RELATION, 0)
 
-    match = log_statement_re.match(line)
-    if match:
-        explain = get_explain(match.group(1))
-        visualize(explain)
+    explain = get_explain(sql)
+    visualize(explain)
 
 
 def analyze_from_logs():
-    with open(args.file, 'r') as logs:
-        for line in logs:
-            handle_log_line(line)
+    logs = None
+
+    setup_log_regex()
+
+    try:
+        logs = open(args.file, 'r')
+    except IOError as e:
+        print("Error: cannot open file {args.file}")
+        print(e.strerror)
+        exit(1)
+
+    lines = logs.readlines()
+    n_lines = len(lines)
+
+    for idx, line in enumerate(lines):
+
+        statement = None
+        params = None
+
+        if match := log_statement_re.match(line.strip()):
+            statement = match.group(1)
+
+        elif match := log_execute_re.match(line.strip()):
+
+            id = match.group(1)
+            n = int(match.group(2)) + 1
+
+            pid = match.group(3)
+            statement = match.group(4)
+
+            # optimistically assume that the two line statement's second
+            # line is in between the next then lines
+            match = None
+            if statement[-1] != ';':
+                idx_curr = idx + 1
+                multi_line_re = compile(f'.*\[{id}-{n}]  (.*)')
+                while idx_curr < idx + 10:
+                    if match := multi_line_re.match(lines[idx_curr]):
+                        break
+                    idx_curr += 1
+            
+            if match is not None:
+                n += 1
+                statement = statement + " " + match.group(1)
+
+            if '$' in statement:
+                params_re = compile(f'.*\[{id}-{n}].*\[{pid}\] {args.log_id} 
DETAIL:  parameters: (\$\d+ = .*)')
+    
+                idx_curr = idx + 1
+                match = None
+                while idx_curr < n_lines:
+                    if match := params_re.match(lines[idx_curr]):
+                        break
+                    idx_curr += 1
+    
+                if match is None:
+                    print(f"ERROR: Could not find params for 
query\n\n{statement}\n")
+                    continue
+    
+                params = log_get_params_re.findall(match.group(1))
+                for param in params:
+                    statement = sub(escape(param[0]) + r'\b', 
escape(param[1]), statement)
+
+        if statement is not None:
+            handle_log_statement(statement)
+            statement = None
+
+    logs.close()
 
 
 def main():
@@ -258,6 +375,11 @@ def main():
     log_p = s_p.add_parser('logs',  
                            parents=[global_args],
                            help='Execute queries from a log file (postgres 
must have set log_statement=all)')
+    log_p.add_argument('-l', '--log-identifier',
+                       dest='log_id',
+                       type=str,
+                       default="taler@taler-exchange",
+                       help='Identifier (user@db_name) to filter which 
statements from the logs to analyze (taler@taler-exchange)')
     log_p.add_argument(metavar='FILE',
                        dest='file',
                        type=str,
diff --git a/experiment/scripts/database.sh b/experiment/scripts/database.sh
index 6e10fb2..1e0d4da 100755
--- a/experiment/scripts/database.sh
+++ b/experiment/scripts/database.sh
@@ -30,12 +30,12 @@ function setup_disks() {
 
 function setup_ram_storage() {
   SIZE=$(($(awk '/MemTotal/ {print $2}' /proc/meminfo) / 10))
-  if ! df | grep -q /var/lib/postgresql; then
-    mv /var/lib/postgresql /var/lib/postgresql.bak
-    mkdir /var/lib/postgresql
-    chown postgres:postgres /var/lib/postgresql
-    mount -t tmpfs -o size=${SIZE}k pgdata /var/lib/postgresql
-    cp -rp /var/lib/postgresql.bak/* /var/lib/postgresql
+  if ! df | grep -q /tmp/postgresql; then
+    mv /tmp/postgresql /tmp/postgresql.bak
+    mkdir /tmp/postgresql
+    chown postgres:postgres /tmp/postgresql
+    mount -t tmpfs -o size=${SIZE}k pgdata /tmp/postgresql
+    cp -rp /tmp/postgresql.bak/* /tmp/postgresql
   fi
 }
 
@@ -72,11 +72,25 @@ function setup_config() {
   # in the main configuration
   echo "
   listen_addresses='*'
+
   log_destination=syslog
-  log_error_verbosity=terse
   syslog_ident='taler-database'
+  log_error_verbosity=terse
+
+  # for pgbadger
+  # log_checkpoints=on
+  # log_connections=on
+  # log_disconnections=on
+  # log_lock_waits=on
+  # log_temp_files=0
+  # log_autovacuum_min_duration=0
+  # log_error_verbosity=default
+  # log_duration=on
+  
+  # log_statement=all
+
   log_min_duration_statement=${DB_MIN_DURATION_STATEMENT}
-  shared_preload_libraries='pg_stat_statements,auto_explain'
+
   auto_explain.log_min_duration='${DB_MIN_DURATION_STATEMENT}ms'
   auto_explain.log_verbose=true
   auto_explain.log_nested_statements=true
@@ -84,8 +98,10 @@ function setup_config() {
   auto_explain.log_buffers=true
   auto_explain.log_wal=true
 
+  shared_preload_libraries='pg_stat_statements,auto_explain'
+
   # Should be set locally
-  join_collapse_limit=1
+  # join_collapse_limit=1
 
   # Large tables perform bad with the default settings
   # However, they could also be set on each table indiviudally
@@ -111,12 +127,11 @@ function setup_config() {
   
   # 
(https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-MAX-WAL-SIZE)
   min_wal_size=4GB
-  max_wal_size=8GB
-  wal_buffers=16MB
+  max_wal_size=100GB
+  wal_buffers=1GB
 
-  # Hopefully reduce disk IO 
-  # https://www.postgresql.org/docs/12/wal-configuration.html
   checkpoint_completion_target=0.9
+  checkpoint_timeout = 15min
   random_page_cost=1.1
   
   # Default 2kB
@@ -132,7 +147,7 @@ function setup_config() {
 
   # Bad when turned off - Recovering db may not be possible
   # https://www.postgresql.org/docs/13/runtime-config-wal.html#GUC-FSYNC
-  # fsync = off
+  fsync = off
 
   # Not so bad as when turning off fsync, but single transactions might get 
lost on crash - but
   # like they would have aborted cleanly
@@ -159,7 +174,7 @@ function setup_config() {
   max_parallel_workers_per_gather=10
   max_connections=500
 
-  max_parallel_maintenance_workers=4
+  max_parallel_maintenance_workers=12
   
   # out of shared memory
   max_locks_per_transaction=85

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.

[Prev in Thread]

Current Thread

[Next in Thread]

[taler-grid5k] 77/189: test custom metrics, (continued)
- [taler-grid5k] 77/189: test custom metrics, gnunet, 2022/04/28
- [taler-grid5k] 116/189: fix verbose output, gnunet, 2022/04/28
- [taler-grid5k] 124/189: enable admin api for prometheus in order to test snapshots, gnunet, 2022/04/28
- [taler-grid5k] 75/189: also separate wallet from the main syslog receiver, gnunet, 2022/04/28
- [taler-grid5k] 80/189: change denominations, gnunet, 2022/04/28
- [taler-grid5k] 62/189: fix lookahead sign, gnunet, 2022/04/28
- [taler-grid5k] 84/189: cleaner exchange setup, gnunet, 2022/04/28
- [taler-grid5k] 118/189: add sharding logic, gnunet, 2022/04/28
- [taler-grid5k] 105/189: work towards parsing logs, gnunet, 2022/04/28
- [taler-grid5k] 79/189: add histograms, gnunet, 2022/04/28
- [taler-grid5k] 108/189: update explain.py - multiline params not working yet - needs some style improvements, gnunet <=
- [taler-grid5k] 99/189: add explain analyzer, gnunet, 2022/04/28
- [taler-grid5k] 88/189: fix build, gnunet, 2022/04/28
- [taler-grid5k] 95/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 109/189: update promtail message size, gnunet, 2022/04/28
- [taler-grid5k] 117/189: some db config, gnunet, 2022/04/28
- [taler-grid5k] 64/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 96/189: fixes, add lookahead sign as param, gnunet, 2022/04/28
- [taler-grid5k] 70/189: fix, gnunet, 2022/04/28
- [taler-grid5k] 120/189: update postgres exporter to work when there is no merchant, update grafana dashboards, gnunet, 2022/04/28
- [taler-grid5k] 74/189: reduce amount of logging wallets, gnunet, 2022/04/28

Prev by Date: [taler-grid5k] 79/189: add histograms
Next by Date: [taler-grid5k] 99/189: add explain analyzer
Previous by thread: [taler-grid5k] 79/189: add histograms
Next by thread: [taler-grid5k] 99/189: add explain analyzer
Index(es):
- Date
- Thread