Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature

From:	Steven Price
Subject:	Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Date:	Wed, 31 Mar 2021 11:41:20 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1

On 31/03/2021 10:32, David Hildenbrand wrote:

On 31.03.21 11:21, Catalin Marinas wrote:
On Wed, Mar 31, 2021 at 09:34:44AM +0200, David Hildenbrand wrote:
On 30.03.21 12:30, Catalin Marinas wrote:
On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
On 28/03/2021 13:21, Catalin Marinas wrote:
On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..b31b7a821f90 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu*vcpu, phys_addr_t fault_ipa,
        if (vma_pagesize == PAGE_SIZE && !force_pte)
vma_pagesize = transparent_hugepage_adjust(memslot,hva,
                                   &pfn, &fault_ipa);
+
+ if (fault_status != FSC_PERM && kvm_has_mte(kvm) &&pfn_valid(pfn)) {
+        /*
+ * VM will be able to see the page's tags, so we mustensure+ * they have been initialised. if PG_mte_tagged is set,tags
+         * have already been initialised.
+         */
+        struct page *page = pfn_to_page(pfn);
+        unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
+
+        for (i = 0; i < nr_pages; i++, page++) {
+            if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+                mte_clear_page_tags(page_address(page));
+        }
+    }
This pfn_valid() check may be problematic. Following commiteeb0753ba27b("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), itreturnstrue for ZONE_DEVICE memory but such memory is allowed not tosupport
MTE.
Some more thinking, this should be safe as any ZONE_DEVICE would be
mapped as untagged memory in the kernel linear map. It could beslightly
inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
untagged memory. Another overhead is pfn_valid() which will likelyend
up calling memblock_is_map_memory().

However, the bigger issue is that Stage 2 cannot disable tagging for
Stage 1 unless the memory is Non-cacheable or Device at S2. Isthere away to detect what gets mapped in the guest as Normal Cacheablememoryand make sure it's only early memory or hotplug but no ZONE_DEVICE(or
something else like on-chip memory)?  If we can't guarantee that all
Cacheable memory given to a guest supports tags, we should disablethe
feature altogether.
In stage 2 I believe we only have two types of mapping - 'normal' or
DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out thelatter is a
case of checking the 'device' variable, and makes sense to avoid the
overhead you describe.
This should also guarantee that all stage-2 cacheable memorysupports tags,as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid()should only
be true for memory that Linux considers "normal".
If you think "normal" == "normal System RAM", that's wrong; see below.
By "normal" I think both Steven and I meant the Normal Cacheable memory
attribute (another being the Device memory attribute).

Sadly there's no good standardised terminology here. Aarch64 providesthe "normal (cacheable)" definition. Memory which is mapped as "NormalCacheable" is implicitly MTE capable when shared with a guest (becausethe stage 2 mappings don't allow restricting MTE other than mapping itas Device memory).

So MTE also forces us to have a definition of memory which is "bogstandard memory"[1] separate from the mapping attributes. This is themain memory which fully supports MTE.

Separate from the "bog standard" we have the "special"[1] memory, e.g.ZONE_DEVICE memory may be mapped as "Normal Cacheable" at stage 1 butthat memory may not support MTE tags. This memory can only be safelyshared with a guest in the following situations:


 1. MTE is completely disabled for the guest

 2. The stage 2 mappings are 'device' (e.g. DEVICE_nGnRE)

 3. We have some guarantee that guest MTE access are in some way safe.

(1) is the situation today (without this patch series). But it preventsthe guest from using MTE in any form.

(2) is pretty terrible for general memory, but is the get-out clause formapping devices into the guest.

(3) isn't something we have any architectural way of discovering. We'dneed to know what the device did with the MTE accesses (and any cachesbetween the CPU and the device) to ensure there aren't any side-channelsor h/w lockup issues. We'd also need some way of describing this memoryto the guest.

So at least for the time being the approach is to avoid letting a guestwith MTE enabled have access to this sort of memory.

[1] Neither "bog standard" nor "special" are real terms - like I saidthere's a lack of standardised terminology.

That's the problem. With Anshuman's commit I mentioned above,
pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
memory, not talking about some I/O mapping that requires Device_nGnRE).
So kvm_is_device_pfn() is false for such memory and it may be mapped as
Normal but it is not guaranteed to support tagging.


pfn_valid() means "there is a struct page"; if you do pfn_to_page() and
touch the page, you won't fault. So Anshuman's commit is correct.


I agree.

pfn_to_online_page() means, "there is a struct page and it's system RAM
that's in use; the memmap has a sane content"


Does pfn_to_online_page() returns a valid struct page pointer for
ZONE_DEVICE pages? IIUC, these are not guaranteed to be system RAM, for
some definition of system RAM (I assume NVDIMM != system RAM). For
example, pmem_attach_disk() calls devm_memremap_pages() and this would
use the Normal Cacheable memory attribute without necessarily being
system RAM.


No, not for ZONE_DEVICE.

However, if you expose PMEM via dax/kmem as System RAM to the system (->add_memory_driver_managed()), then PMEM (managed via ZONE_NOMRAL orZONE_MOVABLE) would work with pfn_to_online_page() -- as the systemthinks it's "ordinary system RAM" and the memory is managed by the buddy.

So if I'm understanding this correctly for KVM we need to usepfn_to_online_pages() and reject if NULL is returned? In the case ofdax/kmem there already needs to be validation that the memory supportsMTE (otherwise we break user space) before it's allowed into the"ordinary system RAM" bucket.


Steve

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v10 0/6] MTE support for KVM guest, Steven Price, 2021/03/12
- [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Steven Price, 2021/03/12
  - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Catalin Marinas, 2021/03/27
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Catalin Marinas, 2021/03/28
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Steven Price, 2021/03/29
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Catalin Marinas, 2021/03/30
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, David Hildenbrand, 2021/03/31
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Catalin Marinas, 2021/03/31
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, David Hildenbrand, 2021/03/31
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Steven Price <=
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, David Hildenbrand, 2021/03/31
    - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature, Catalin Marinas, 2021/03/31
- [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged, Steven Price, 2021/03/12
  - Re: [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged, Catalin Marinas, 2021/03/26
    - Re: [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged, Steven Price, 2021/03/29
    - Re: [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged, Catalin Marinas, 2021/03/30
    - Re: [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged, Steven Price, 2021/03/31
- [PATCH v10 3/6] arm64: kvm: Save/restore MTE registers, Steven Price, 2021/03/12
- [PATCH v10 4/6] arm64: kvm: Expose KVM_ARM_CAP_MTE, Steven Price, 2021/03/12
- [PATCH v10 5/6] KVM: arm64: ioctl to fetch/store tags in a guest, Steven Price, 2021/03/12

Prev by Date: [PATCH] hw/riscv: sifive_e: Add 'const' to sifive_e_memmap[]
Next by Date: Re: [PATCH v3 0/8] [RfC] fix tracing for modules
Previous by thread: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Next by thread: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Index(es):
- Date
- Thread