Migration very slow on block copy

qemu-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Migration very slow on block copy

From:	Adrien G
Subject:	Migration very slow on block copy
Date:	Tue, 11 Aug 2020 16:43:46 +0200

Hi,

I'm doing a live migration with a non shared storage on QEMY 4.2.0.
Unfortunately the block migration is terribly slow and I can't find why.

It does not saturate the network link and use only in average 70Mbps on 
1000Mbps.
Both source and destination hosts seem OK, they are not loaded and no CPU 
saturate.

I have tried to send the block image using scp to check the disk speed & 
network at the same time.
It works well and I saturate the 1000Mbps link, meaning the network and disk 
seem to work well.

On destination, I've created an empty QCOW2 image, then start QEMU with 
"-incoming defer" argument.
On source, I start the migration with { execute: 'migrate', arguments: { uri: 
`tcp:<ip>:<port>`, blk: true } }.

I have set these options on both source and destination:
"query-migrate-capabilities":
  - auto-converge: true
  - zero-blocks: true
  - events: true
  - postcopy-ram: true
  - block: true
  - return-path: true
  - postcopy-blocktime: true
  - validate-uuid: true

  - xbzrle: false
  - rdma-pin-all: false
  - compress: false
  - x-colo: false
  - release-ram: false
  - pause-before-switchover: false
  - multifd: false
  - dirty-bitmaps: false
  - late-block-activate: false
  - x-ignore-shared: false

"query-migrate-parameters":
  - downtime-limit: 1000
  - max-bandwidth: 18446744073709551615 (=> -1)

  - cpu-throttle-initial: 1
  - cpu-throttle-increment: 20
  - max-cpu-throttle: 1

  - xbzrle-cache-size: 67108864
  - announce-max: 550
  - announce-initial: 50
  - announce-rounds: 5
  - announce-step: 100
  - decompress-threads: 2
  - compress-threads: 8
  - compress-level: 1
  - compress-wait-thread: false
  - multifd-channels: 2
  - block-incremental: false
  - tls-authz: ""
  - tls-creds: ""
  - tls-hostname: ""
  - max-postcopy-bandwidth: 0
  - x-checkpoint-delay: 60000


"query-migrate":
  "status": "active",
  "setup-time": 2129,
  "total-time": 25257536,
  "expected-downtime": 1000,
  "disk": {
    "total": 864362168320,
    "remaining": 495024340992,
    "transferred": 369337827328
  },
  "ram": {
    "total": 103084531712,
    "postcopy-requests": 0,
    "dirty-sync-count": 1,
    "multifd-bytes": 0,
    "pages-per-second": 0,
    "page-size": 4096,
    "remaining": 103084531712,
    "mbps": 121.521105, => SLOW, way under the 1000Mbps link
    "transferred": 544729048,
    "duplicate": 0,
    "dirty-pages-rate": 0,
    "skipped": 0,
    "normal-bytes": 0,
    "normal": 0
  }


strace (~5 seconds) on source:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.37    0.559057        1440       388           ppoll
 41.46    0.442608         937       472           io_submit
  4.17    0.044529          20      2172           clock_gettime
  1.86    0.019809          54       363           read
  0.10    0.001018           2       420           write
  0.02    0.000258           2        94         1 futex
  0.02    0.000237           3        79           munmap
  0.00    0.000002           0         4           sendmsg
------ ----------- ----------- --------- --------- ----------------
100.00    1.067518         267      3992         1 total

The futext errors are "futex(0x1246dc0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 
EAGAIN (Resource temporarily unavailable)"


strace (~5 seconds) on destination:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.49    0.077527         106       730           ppoll
 21.06    0.029972          21      1407        68 recvmsg
 15.50    0.022050          15      1460           gettimeofday
  4.69    0.006676          10       612           futex
  3.86    0.005486           8       681           read
  0.41    0.000579           4       136           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.142290          28      5026        68 total

The recvmsg errors are "recvmsg(44, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 
EAGAIN (Resource temporarily unavailable)".


I have done a lot of tests but I'm now at a dead end and don't have any more 
idea to understand why it is so slow.
Does anyone have any idea?

Best,
Adrien

[Prev in Thread]

Current Thread

[Next in Thread]

Migration very slow on block copy, Adrien G <=

Prev by Date: Re: [Virtio-fs] virtio-fs performance
Next by Date: Get backtrace from segfault in qemu user mode
Previous by thread: [Driver error: Code 43] Trying to GPU passthrough my IGD (Intel HD Graphics 4600) using legacy mode.
Next by thread: Get backtrace from segfault in qemu user mode
Index(es):
- Date
- Thread