The description starts with "A malicious guest in control of the
iSCSI server ..." so asserting (and killing the VM) doesn't seem
correct...
assert() isn't an error check, but it means that we deem it impossible
for the assertion to fail. This would be the case because we fixed (in
this patch) the only code path that we think could cause the problem.
We would only add it to find other buggy code paths that we missed or
that are added later.
Correct. That's why I would have the proper checks (or "trim"s) closer
to where they were issued to fail sooner. What I meant is that if a
guest issues any operation that spans past the end of the drive, then
the operation stops there and an error is returned accordingly.
Guests can't issue operations that span past the end of the drive. They
would return an error befor the iscsi driver is even called.
The only reason why we get such a request here is because of an internal
call with BDRV_REQUEST_MAX_BYTES. Maybe this should actually be changed
into MIN(BDRV_REQUEST_MAX_BYTES, bs->total_sectors * BDRV_SECTOR_SIZE),
and then iscsi_co_block_status() could assert that the request doesn't
span past the end of the drive.
This means nothing should ever try to touch these bitmaps out of
bounds. Nevertheless, and further to that, assert()s can be used
closer to where the bitmap is touched to catch programming errors.
I suppose the iSCSI protocol has some error to return for invalid
requests.
Which invalid you are referring to? From the initiator or the target?
AFAICT the problem is that the SCSI SPEC doesn't limit a target to
respond provisioning status past the (current) end of the LUN (either
because this was not deemed important to stress, was forgotten, or is
intentionally allowed).
In any case, we don't get an invalid request here. We are who made the
request. It's an unexpected response that we got.
Also shouldn't we report some warning in case of such invalid
request? So the management side can look at the 'malicious iSCSI
server'?
I think having the option to do so is a good idea. There are two cases
I can think of that you run into a "malicious" storage server:
1) Someone hacked your storage server
2) Your control plane allows your compute to connect to a user
provided storage service
Thinking as an admin, if I only allow storage servers I provide, then
I want to see such warnings. If I let people point the VMM to dodgy
servers, then I probably don't want the log spam.
For this reason, we generally don't log things for failed I/O requests.
If we wanted to introduce it, we'd better find a way to do so
consistently everywhere and not just in a single place with a one-off
option.