[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-readline] Pasting long lines is slow

From: Ole Laursen
Subject: Re: [Bug-readline] Pasting long lines is slow
Date: Fri, 15 May 2015 20:33:37 +0200

2015-04-30 17:03 GMT+02:00 Ole Laursen <address@hidden>:
> I'm seeing what looks to me like quadratic behaviour on pasting long
> lines in readline applications like bash, python, psql.
> To reproduce: open a new terminal with say bash, make a really long
> line (> 20k characters, e.g. with python -c 'print "a" * 20000'), then
> paste it into the terminal. The first characters are inserted quickly,
> but it gradually slows down. With enough data it ends up being so slow
> as to be unusable.

I did a profile now on examples/fileman with perf:


Here are the results from pasting 5000 x "aaa " with my default locale
da_DK.UTF-8. As is evident most of the time is spent measuring the
widths of the a's:

  21,22%  fileman  libc-2.19.so       [.] __gconv_transform_utf8_internal
  20,60%  fileman  libc-2.19.so       [.] wcwidth
  19,80%  fileman  libc-2.19.so       [.] __mbrtowc
  15,45%  fileman  fileman            [.] rl_redisplay
  12,44%  fileman  libc-2.19.so       [.] __ctype_get_mb_cur_max
   2,74%  fileman  libc-2.19.so       [.] __x86.get_pc_thunk.bx
   2,63%  fileman  libc-2.19.so       [.] _dl_mcount_wrapper_check
   1,42%  fileman  libc-2.19.so       [.] __x86.get_pc_thunk.cx
   1,34%  fileman  fileman            [.] _rl_find_prev_mbchar
   1,00%  fileman  fileman            [.] 0x000014c0
   0,57%  fileman  fileman            [.] 0x00001520
   0,12%  fileman  fileman            [.] _rl_col_width
   0,08%  fileman  libc-2.19.so       [.] __memcmp_sse4_2
   0,05%  fileman  fileman            [.] update_line
   0,03%  fileman  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
   0,02%  fileman  [vdso]             [.] __kernel_vsyscall
   0,02%  fileman  libc-2.19.so       [.] __memset_sse2_rep
   0,01%  fileman  fileman            [.] address@hidden
   0,01%  fileman  [kernel.kallsyms]  [k] enqueue_task_fair
   0,01%  fileman  [i915]             [k] gen6_read32
   0,01%  fileman  [kernel.kallsyms]  [k] mutex_unlock
   0,01%  fileman  [kernel.kallsyms]  [k] native_write_msr_safe
   0,01%  fileman  libc-2.19.so       [.] _IO_file_sync@@GLIBC_2.1
   0,01%  fileman  libc-2.19.so       [.] new_do_write
   0,01%  fileman  [kernel.kallsyms]  [k] resched_curr
   0,01%  fileman  [kernel.kallsyms]  [k] restore_nocheck
   0,01%  fileman  [kernel.kallsyms]  [k] insert_work
   0,01%  fileman  [kernel.kallsyms]  [k] timekeeping_update.constprop.7
   0,01%  fileman  [kernel.kallsyms]  [k] __queue_work
   0,01%  fileman  [kernel.kallsyms]  [k] __schedule
   0,01%  fileman  [kernel.kallsyms]  [k] n_tty_read
   0,01%  fileman  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
   0,01%  fileman  [kernel.kallsyms]  [k] task_waking_fair
   0,01%  fileman  [kernel.kallsyms]  [k] tty_paranoia_check
   0,01%  fileman  [kernel.kallsyms]  [k] sysenter_past_esp
   0,01%  fileman  [kernel.kallsyms]  [k] vfs_read
   0,01%  fileman  [kernel.kallsyms]  [k] queue_work_on
   0,01%  fileman  [kernel.kallsyms]  [k] timerqueue_del
   0,01%  fileman  libc-2.19.so       [.] _IO_fwrite
   0,01%  fileman  fileman            [.] _rl_move_cursor_relative
   0,01%  fileman  [kernel.kallsyms]  [k] __copy_to_user_ll
   0,01%  fileman  [kernel.kallsyms]  [k] update_sd_lb_stats
   0,01%  fileman  [kernel.kallsyms]  [k] down_read
   0,01%  fileman  [kernel.kallsyms]  [k] wake_up_process
   0,01%  fileman  [kernel.kallsyms]  [k] native_sched_clock
   0,01%  fileman  fileman            [.] _rl_internal_char_cleanup
   0,01%  fileman  [kernel.kallsyms]  [k] security_file_permission
   0,01%  fileman  fileman            [.] _rl_dispatch_subseq
   0,00%  fileman  [kernel.kallsyms]  [k] cpuacct_charge
   0,00%  fileman  [kernel.kallsyms]  [k] memcpy
   0,00%  fileman  [kernel.kallsyms]  [k] apic_timer_interrupt
   0,00%  fileman  libc-2.19.so       [.] __GI___mempcpy
   0,00%  fileman  fileman            [.] rl_insert_text

Compare this to a similar profile with LANG=C which is much faster:

  61,71%  fileman  libc-2.19.so       [.] __ctype_get_mb_cur_max
  23,44%  fileman  fileman            [.] rl_redisplay
   5,01%  fileman  libc-2.19.so       [.] __x86.get_pc_thunk.cx
   4,87%  fileman  fileman            [.] update_line
   3,35%  fileman  fileman            [.] 0x000014c0
   0,07%  fileman  [kernel.kallsyms]  [k] enqueue_task_fair
   0,06%  fileman  [vdso]             [.] __kernel_vsyscall
   0,06%  fileman  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
   0,05%  fileman  [kernel.kallsyms]  [k] insert_work
   0,05%  fileman  [kernel.kallsyms]  [k] native_write_msr_safe
   0,04%  fileman  [kernel.kallsyms]  [k] restore_nocheck
   0,04%  fileman  [kernel.kallsyms]  [k] sysenter_past_esp
   0,03%  fileman  fileman            [.] _rl_dispatch_subseq
   0,03%  fileman  [kernel.kallsyms]  [k] fsnotify
   0,03%  fileman  [kernel.kallsyms]  [k] n_tty_write
   0,03%  fileman  [kernel.kallsyms]  [k] down_read
   0,03%  fileman  [kernel.kallsyms]  [k] _raw_spin_lock
   0,03%  fileman  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
   0,03%  fileman  libc-2.19.so       [.] new_do_write
   0,03%  fileman  fileman            [.] readline_internal_char
   0,03%  fileman  [kernel.kallsyms]  [k] atomic64_read_cx8
   0,02%  fileman  [kernel.kallsyms]  [k] wake_up_worker
   0,02%  fileman  [kernel.kallsyms]  [k] __queue_work
   0,02%  fileman  fileman            [.] _rl_internal_char_cleanup
   0,02%  fileman  [kernel.kallsyms]  [k] rb_erase
   0,02%  fileman  [kernel.kallsyms]  [k] cpuacct_charge
   0,02%  fileman  [kernel.kallsyms]  [k] copy_from_read_buf
   0,02%  fileman  [kernel.kallsyms]  [k] apic_timer_interrupt
   0,02%  fileman  [kernel.kallsyms]  [k] update_curr
   0,02%  fileman  fileman            [.] rl_insert_text

I'm assuming that readline does a redisplay on each inserted char? In
this redisplay, it goes through all characters on the line inserted so
far, so the end result is O(n^2).

In a multi-byte case, examining each character is obviously much more expensive.

Notice that in even with LANG=C, __ctype_get_mb_cur_max accounts for
61.7% of the time. As far as I can tell, this comes from MB_CUR_MAX
which despite the naming is not a constant:


As a quickfix for that, I'm attaching a patch to only compute
MB_CUR_MAX once in the top of the function. With this patch, the
LANG=C profile is noticeable faster and looks like this

  87,55%  fileman  fileman            [.] rl_redisplay
   9,54%  fileman  fileman            [.] update_line
   0,41%  fileman  libc-2.19.so       [.] __ctype_get_mb_cur_max
   0,12%  fileman  [vdso]             [.] __kernel_vsyscall
   0,09%  fileman  [kernel.kallsyms]  [k] sysenter_past_esp
   0,07%  fileman  [kernel.kallsyms]  [k] n_tty_read
   0,06%  fileman  [kernel.kallsyms]  [k] try_to_wake_up
   0,06%  fileman  [kernel.kallsyms]  [k] restore_nocheck
   0,06%  fileman  [kernel.kallsyms]  [k] resched_curr
   0,06%  fileman  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
   0,06%  fileman  [kernel.kallsyms]  [k] atomic64_read_cx8
   0,05%  fileman  [kernel.kallsyms]  [k] tty_paranoia_check
   0,05%  fileman  fileman            [.] rl_insert_text

This helps a bit with the UTF 8 case, but not nearly enough.

I suppose one could try to make the loop content faster somehow, but
to really fix this to not be O(n^2) one should only do the work
necessary to take the single added character into account instead of
starting off from scratch each time. Or perhaps try to tackle this
case of copy-pasting a bucket-load of characters by batching them?
What do you think?


Attachment: mb-cur-max-fix.patch
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]