[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-discuss] BLOCK_JOB_ERROR showing up on qmp monitor socket, failing
From: |
Abe Massry |
Subject: |
[Qemu-discuss] BLOCK_JOB_ERROR showing up on qmp monitor socket, failing live migration |
Date: |
Tue, 25 Sep 2018 11:58:41 -0400 |
Hello,
I'm seeing this error message come across the qemu qmp monitor and it
is preventing live migrations from completing successfully on a number
of qemu instances. Some of them do complete successfully with the same
parameters.
{"timestamp": {"seconds": 1537544621, "microseconds": 488111},
"event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk-1",
"operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1537544621, "microseconds": 488957},
"event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk-1",
"operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1537544621, "microseconds": 501077},
"event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk-1",
"operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1537544621, "microseconds": 501694},
"event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk-1",
"operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1537544621, "microseconds": 606157},
"event": "BLOCK_JOB_COMPLETED", "data": {"device":
"drive-scsi-disk-1", "len": 541065216, "offset": 536870912, "speed":
1073741824, "type": "mirror", "error": "Input/output error"}}
in most (but not all cases) the difference between "len" and "offset"
is ( 541065216 - 536870912 ) / 1024 = 4096
which leads me to believe it's missing one 4k block
the destination qemu instance is started with:
-incoming tcp:$RamMigrationIP:$RamMigrationPort
and the nbd server is started on the destination
{
"execute": "nbd-server-start",
"arguments": {
"addr": {
"type": "inet",
"data": {
"host": $ip,
"port": $port
}
}
}
}
the command I'm running on the source is:
{
"execute": "drive-mirror",
"arguments": {
"device": "drive-scsi-disk-1,
"target": "nbd://$ip:$port/drive-scsi-disk-1",
"speed": 1073741824,
"sync": "full",
"mode": "existing",
"format": "raw"
}
}
going from qemu 2.11.1 to 2.11.2
I've also started throttling the disk io during live migration with
{
"execute": "block_set_io_throttle",
"arguments": {
"device": drive-scsi-disk-1,
"bps_rd": 0,
"bps_wr": 0,
"bps": 104857600,
"iops": 0,
"iops_rd": 0,
"iops_wr": 0
}
}
This allowed disks to complete the live migration that previously
couldn't due to IO being too high.
Has anyone seen this before? Does anyone know what the problem is or
how to fix it?
I would appreciate any help very much.
Thank you,
Abe
--
Abe Massry
Linode - https://www.linode.com/
--
- [Qemu-discuss] BLOCK_JOB_ERROR showing up on qmp monitor socket, failing live migration,
Abe Massry <=