qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Migration very slow on block copy


From: Adrien G
Subject: Migration very slow on block copy
Date: Tue, 11 Aug 2020 16:43:46 +0200

Hi,

I'm doing a live migration with a non shared storage on QEMY 4.2.0.
Unfortunately the block migration is terribly slow and I can't find why.

It does not saturate the network link and use only in average 70Mbps on 
1000Mbps.
Both source and destination hosts seem OK, they are not loaded and no CPU 
saturate.

I have tried to send the block image using scp to check the disk speed & 
network at the same time.
It works well and I saturate the 1000Mbps link, meaning the network and disk 
seem to work well.

On destination, I've created an empty QCOW2 image, then start QEMU with 
"-incoming defer" argument.
On source, I start the migration with { execute: 'migrate', arguments: { uri: 
`tcp:<ip>:<port>`, blk: true } }.

I have set these options on both source and destination:
"query-migrate-capabilities":
  - auto-converge: true
  - zero-blocks: true
  - events: true
  - postcopy-ram: true
  - block: true
  - return-path: true
  - postcopy-blocktime: true
  - validate-uuid: true

  - xbzrle: false
  - rdma-pin-all: false
  - compress: false
  - x-colo: false
  - release-ram: false
  - pause-before-switchover: false
  - multifd: false
  - dirty-bitmaps: false
  - late-block-activate: false
  - x-ignore-shared: false

"query-migrate-parameters":
  - downtime-limit: 1000
  - max-bandwidth: 18446744073709551615 (=> -1)

  - cpu-throttle-initial: 1
  - cpu-throttle-increment: 20
  - max-cpu-throttle: 1

  - xbzrle-cache-size: 67108864
  - announce-max: 550
  - announce-initial: 50
  - announce-rounds: 5
  - announce-step: 100
  - decompress-threads: 2
  - compress-threads: 8
  - compress-level: 1
  - compress-wait-thread: false
  - multifd-channels: 2
  - block-incremental: false
  - tls-authz: ""
  - tls-creds: ""
  - tls-hostname: ""
  - max-postcopy-bandwidth: 0
  - x-checkpoint-delay: 60000


"query-migrate":
  "status": "active",
  "setup-time": 2129,
  "total-time": 25257536,
  "expected-downtime": 1000,
  "disk": {
    "total": 864362168320,
    "remaining": 495024340992,
    "transferred": 369337827328
  },
  "ram": {
    "total": 103084531712,
    "postcopy-requests": 0,
    "dirty-sync-count": 1,
    "multifd-bytes": 0,
    "pages-per-second": 0,
    "page-size": 4096,
    "remaining": 103084531712,
    "mbps": 121.521105, => SLOW, way under the 1000Mbps link
    "transferred": 544729048,
    "duplicate": 0,
    "dirty-pages-rate": 0,
    "skipped": 0,
    "normal-bytes": 0,
    "normal": 0
  }


strace (~5 seconds) on source:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.37    0.559057        1440       388           ppoll
 41.46    0.442608         937       472           io_submit
  4.17    0.044529          20      2172           clock_gettime
  1.86    0.019809          54       363           read
  0.10    0.001018           2       420           write
  0.02    0.000258           2        94         1 futex
  0.02    0.000237           3        79           munmap
  0.00    0.000002           0         4           sendmsg
------ ----------- ----------- --------- --------- ----------------
100.00    1.067518         267      3992         1 total

The futext errors are "futex(0x1246dc0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 
EAGAIN (Resource temporarily unavailable)"


strace (~5 seconds) on destination:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.49    0.077527         106       730           ppoll
 21.06    0.029972          21      1407        68 recvmsg
 15.50    0.022050          15      1460           gettimeofday
  4.69    0.006676          10       612           futex
  3.86    0.005486           8       681           read
  0.41    0.000579           4       136           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.142290          28      5026        68 total

The recvmsg errors are "recvmsg(44, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 
EAGAIN (Resource temporarily unavailable)".


I have done a lot of tests but I'm now at a dead end and don't have any more 
idea to understand why it is so slow.
Does anyone have any idea?

Best,
Adrien


reply via email to

[Prev in Thread] Current Thread [Next in Thread]