[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-readline] Pasting long lines is slow
From: |
Ole Laursen |
Subject: |
Re: [Bug-readline] Pasting long lines is slow |
Date: |
Fri, 15 May 2015 20:33:37 +0200 |
2015-04-30 17:03 GMT+02:00 Ole Laursen <address@hidden>:
> I'm seeing what looks to me like quadratic behaviour on pasting long
> lines in readline applications like bash, python, psql.
>
> To reproduce: open a new terminal with say bash, make a really long
> line (> 20k characters, e.g. with python -c 'print "a" * 20000'), then
> paste it into the terminal. The first characters are inserted quickly,
> but it gradually slows down. With enough data it ends up being so slow
> as to be unusable.
I did a profile now on examples/fileman with perf:
https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record
Here are the results from pasting 5000 x "aaa " with my default locale
da_DK.UTF-8. As is evident most of the time is spent measuring the
widths of the a's:
21,22% fileman libc-2.19.so [.] __gconv_transform_utf8_internal
20,60% fileman libc-2.19.so [.] wcwidth
19,80% fileman libc-2.19.so [.] __mbrtowc
15,45% fileman fileman [.] rl_redisplay
12,44% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
2,74% fileman libc-2.19.so [.] __x86.get_pc_thunk.bx
2,63% fileman libc-2.19.so [.] _dl_mcount_wrapper_check
1,42% fileman libc-2.19.so [.] __x86.get_pc_thunk.cx
1,34% fileman fileman [.] _rl_find_prev_mbchar
1,00% fileman fileman [.] 0x000014c0
0,57% fileman fileman [.] 0x00001520
0,12% fileman fileman [.] _rl_col_width
0,08% fileman libc-2.19.so [.] __memcmp_sse4_2
0,05% fileman fileman [.] update_line
0,03% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0,02% fileman [vdso] [.] __kernel_vsyscall
0,02% fileman libc-2.19.so [.] __memset_sse2_rep
0,01% fileman fileman [.] address@hidden
0,01% fileman [kernel.kallsyms] [k] enqueue_task_fair
0,01% fileman [i915] [k] gen6_read32
0,01% fileman [kernel.kallsyms] [k] mutex_unlock
0,01% fileman [kernel.kallsyms] [k] native_write_msr_safe
0,01% fileman libc-2.19.so [.] _IO_file_sync@@GLIBC_2.1
0,01% fileman libc-2.19.so [.] new_do_write
0,01% fileman [kernel.kallsyms] [k] resched_curr
0,01% fileman [kernel.kallsyms] [k] restore_nocheck
0,01% fileman [kernel.kallsyms] [k] insert_work
0,01% fileman [kernel.kallsyms] [k] timekeeping_update.constprop.7
0,01% fileman [kernel.kallsyms] [k] __queue_work
0,01% fileman [kernel.kallsyms] [k] __schedule
0,01% fileman [kernel.kallsyms] [k] n_tty_read
0,01% fileman [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0,01% fileman [kernel.kallsyms] [k] task_waking_fair
0,01% fileman [kernel.kallsyms] [k] tty_paranoia_check
0,01% fileman [kernel.kallsyms] [k] sysenter_past_esp
0,01% fileman [kernel.kallsyms] [k] vfs_read
0,01% fileman [kernel.kallsyms] [k] queue_work_on
0,01% fileman [kernel.kallsyms] [k] timerqueue_del
0,01% fileman libc-2.19.so [.] _IO_fwrite
0,01% fileman fileman [.] _rl_move_cursor_relative
0,01% fileman [kernel.kallsyms] [k] __copy_to_user_ll
0,01% fileman [kernel.kallsyms] [k] update_sd_lb_stats
0,01% fileman [kernel.kallsyms] [k] down_read
0,01% fileman [kernel.kallsyms] [k] wake_up_process
0,01% fileman [kernel.kallsyms] [k] native_sched_clock
0,01% fileman fileman [.] _rl_internal_char_cleanup
0,01% fileman [kernel.kallsyms] [k] security_file_permission
0,01% fileman fileman [.] _rl_dispatch_subseq
0,00% fileman [kernel.kallsyms] [k] cpuacct_charge
0,00% fileman [kernel.kallsyms] [k] memcpy
0,00% fileman [kernel.kallsyms] [k] apic_timer_interrupt
0,00% fileman libc-2.19.so [.] __GI___mempcpy
0,00% fileman fileman [.] rl_insert_text
Compare this to a similar profile with LANG=C which is much faster:
61,71% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
23,44% fileman fileman [.] rl_redisplay
5,01% fileman libc-2.19.so [.] __x86.get_pc_thunk.cx
4,87% fileman fileman [.] update_line
3,35% fileman fileman [.] 0x000014c0
0,07% fileman [kernel.kallsyms] [k] enqueue_task_fair
0,06% fileman [vdso] [.] __kernel_vsyscall
0,06% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0,05% fileman [kernel.kallsyms] [k] insert_work
0,05% fileman [kernel.kallsyms] [k] native_write_msr_safe
0,04% fileman [kernel.kallsyms] [k] restore_nocheck
0,04% fileman [kernel.kallsyms] [k] sysenter_past_esp
0,03% fileman fileman [.] _rl_dispatch_subseq
0,03% fileman [kernel.kallsyms] [k] fsnotify
0,03% fileman [kernel.kallsyms] [k] n_tty_write
0,03% fileman [kernel.kallsyms] [k] down_read
0,03% fileman [kernel.kallsyms] [k] _raw_spin_lock
0,03% fileman [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0,03% fileman libc-2.19.so [.] new_do_write
0,03% fileman fileman [.] readline_internal_char
0,03% fileman [kernel.kallsyms] [k] atomic64_read_cx8
0,02% fileman [kernel.kallsyms] [k] wake_up_worker
0,02% fileman [kernel.kallsyms] [k] __queue_work
0,02% fileman fileman [.] _rl_internal_char_cleanup
0,02% fileman [kernel.kallsyms] [k] rb_erase
0,02% fileman [kernel.kallsyms] [k] cpuacct_charge
0,02% fileman [kernel.kallsyms] [k] copy_from_read_buf
0,02% fileman [kernel.kallsyms] [k] apic_timer_interrupt
0,02% fileman [kernel.kallsyms] [k] update_curr
0,02% fileman fileman [.] rl_insert_text
I'm assuming that readline does a redisplay on each inserted char? In
this redisplay, it goes through all characters on the line inserted so
far, so the end result is O(n^2).
In a multi-byte case, examining each character is obviously much more expensive.
Notice that in even with LANG=C, __ctype_get_mb_cur_max accounts for
61.7% of the time. As far as I can tell, this comes from MB_CUR_MAX
which despite the naming is not a constant:
http://man7.org/linux/man-pages/man3/MB_CUR_MAX.3.html
As a quickfix for that, I'm attaching a patch to only compute
MB_CUR_MAX once in the top of the function. With this patch, the
LANG=C profile is noticeable faster and looks like this
87,55% fileman fileman [.] rl_redisplay
9,54% fileman fileman [.] update_line
0,41% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
0,12% fileman [vdso] [.] __kernel_vsyscall
0,09% fileman [kernel.kallsyms] [k] sysenter_past_esp
0,07% fileman [kernel.kallsyms] [k] n_tty_read
0,06% fileman [kernel.kallsyms] [k] try_to_wake_up
0,06% fileman [kernel.kallsyms] [k] restore_nocheck
0,06% fileman [kernel.kallsyms] [k] resched_curr
0,06% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0,06% fileman [kernel.kallsyms] [k] atomic64_read_cx8
0,05% fileman [kernel.kallsyms] [k] tty_paranoia_check
0,05% fileman fileman [.] rl_insert_text
This helps a bit with the UTF 8 case, but not nearly enough.
I suppose one could try to make the loop content faster somehow, but
to really fix this to not be O(n^2) one should only do the work
necessary to take the single added character into account instead of
starting off from scratch each time. Or perhaps try to tackle this
case of copy-pasting a bucket-load of characters by batching them?
What do you think?
Ole
mb-cur-max-fix.patch
Description: Binary data
- Re: [Bug-readline] Pasting long lines is slow,
Ole Laursen <=