qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hc


From: David Gibson
Subject: Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
Date: Mon, 24 May 2021 17:32:20 +1000

On Sat, May 22, 2021 at 09:01:26AM +0530, Vaibhav Jain wrote:
> Thanks for looking into this patch David and Groug,
> 
> David Gibson <david@gibson.dropbear.id.au> writes:
> > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> >> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> >> spapr nvdimms. This enables guest to fetch performance stats[2] like
> >> expected life of an nvdimm ('MemLife ') etc and display them to the
> >> user. Linux kernel support for fetching these performance stats and
> >> exposing them to the user-space was done via [3].
> >> 
> >> The hcall semantics mandate that each nvdimm performance stats is
> >> uniquely identied by a 8-byte ascii string encoded as an unsigned
> >> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> >> 8-byte unsigned integer. These performance-stats are exchanged with
> >> the guest in via a guest allocated buffer called
> >> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> >> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> >> of performance-stats described by 'struct papr_scm_perf_stat'. The
> >> hypervisor is expected to validate the rr-buffer header and then based
> >> on the request copy the needed performance-stats to the array of
> >> 'struct papr_scm_perf_stat' following the header.
> >> 
> >> The patch proposes a new function h_scm_performance_stats() that
> >> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> >> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> >> proceeds to fill the rr-buffer with requested performance-stats. The
> >> value of individual stats is retrived from individual accessor
> >> function for the stat which are indexed in the array
> >> 'nvdimm_perf_stats'.
> >> 
> >> References:
> >> [1] "Hypercall Op-codes (hcalls)"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> >> [2] Sysfs attribute documentation for papr_scm
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> >> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> >> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
> >> 
> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> >> ---
> >> Changelog
> >> 
> >> Since RFC-v2:
> >> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> >>   minimum size buffer needed to return all supported performance
> >>   stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> >> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> >>   rr-buffer size from a guest. Value for this is derived from hard
> >>   limit of SCM_STATS_MAX_STATS.
> >> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> >>   rr-buffer. [David]
> >> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> >>   min size of rr-buffer [David]
> >> * Updated h_scm_performance_stats() to have a single allocation for
> >>   rr-buffer and copy it back to guest memory in a single shot. [David]
> >> 
> >> Since RFC-v1:
> >> * Removed empty lines from code [ David ]
> >> * Updated struct papr_scm_perf_stat to use uint64_t for
> >>   statistic_id.
> >> * Added a hard limit on max number of stats requested to 255 [ David ]
> >> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> >>   size [ David ]
> >> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> >> * Removed a redundant call to address_space_access_valid() in
> >>   scm_perf_check_rr_buffer() [ David ]
> >> * Instead of allocating a minimum size local buffer, allocate a max
> >>   possible size local rr-buffer. [ David ]
> >> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> >> * Updated h_scm_performance_stats() to use a canned-response method
> >>   for simplifying num_stats==0 case [ David ].
> >> ---
> >>  hw/ppc/spapr_nvdimm.c  | 222 +++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |  19 +++-
> >>  2 files changed, 240 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> >> index 252204e25f..287326b9c0 100644
> >> --- a/hw/ppc/spapr_nvdimm.c
> >> +++ b/hw/ppc/spapr_nvdimm.c
> >> @@ -35,6 +35,19 @@
> >>  /* SCM device is unable to persist memory contents */
> >>  #define PAPR_PMEM_UNARMED PPC_BIT(0)
> >>  
> >> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
> >> +#define SCM_STATS_MIN_OUTPUT_BUFFER  (sizeof(struct papr_scm_perf_stats) 
> >> + \
> >> +                                      sizeof(struct papr_scm_perf_stat) * 
> >> \
> >> +                                      ARRAY_SIZE(nvdimm_perf_stats))
> >
> > MIN_OUTPUT_BUFFER is a better name, but still not great.  I think we
> > can get rid of this define completely in a neat way, though.  See below.
> >
> >
> Not sure how we can get rid of it since we still need to advertise to
> the guest how much rr-buffer size we expect to return all
> perf-stats. Sorry but I didnt make out of  any suggestions below that
> could get rid of this define.



> 
> 
> >> +/* Maximum number of stats that we can return back in a single stat 
> >> request */
> >> +#define SCM_STATS_MAX_STATS 255
> >
> > Although it's technically allowed, I'm still not convinced there's
> > actually any reason to allow the user to request the same stat over
> > and over.
> >
> Matching the PowerVM behaviour here which doesnt enforce any limitations
> on the how many times a single perf-stat can appear in rr-buffer.

Hm, I guess matching PowerVM is worthwhile.  Still can't imagine any
case where a client would actually want to do so.

> 
> >> +/* Maximum possible output buffer to fulfill a perf-stats request */
> >> +#define SCM_STATS_MAX_OUTPUT_BUFFER  (sizeof(struct papr_scm_perf_stats) 
> >> + \
> >> +                                      sizeof(struct papr_scm_perf_stat) * 
> >> \
> >> +                                      SCM_STATS_MAX_STATS)
> >> +
> >>  bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice 
> >> *nvdimm,
> >>                             uint64_t size, Error **errp)
> >>  {
> >> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, 
> >> SpaprMachineState *spapr,
> >>      return H_SUCCESS;
> >>  }
> >>  
> >> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> >> +{
> >> +    *val = 0;
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t 
> >> *val)
> >> +{
> >> +    /* Assume full life available of an NVDIMM right now */
> >> +    *val = 100;
> >> +    return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * Holds all supported performance stats accessors. Each 
> >> performance-statistic
> >> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> >> + * which indicate in percentage how much usage life of an nvdimm is 
> >> remaining.
> >> + * 'NoopStat' which is primarily used to test support for retriving 
> >> performance
> >> + * stats and also to replace unknown stats present in the rr-buffer.
> >> + *
> >> + */
> >> +static const struct {
> >> +    char stat_id[8];
> >
> > So using a char[] here, but a uint64_t in the request structure makes
> > it pretty hard to follow if you're doing the right thing
> > w.r.t. endianness, especially since you effectively memcmp() directly
> > between u64s and char[]s.  You really want to use a consistent type
> > for the ids.
> >
> Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte
> ascii strings encoded in a u64.

Yes, I got that.  The typing should still be consistent.

> The guest kernel and this proposed qemu
> patch doesnt do any math operations on them which might be effected by
> their endianess.

You do however return it in a register in at least one case, so you
need to be careful about how that's loaded or stored.

> The switch from u64->char[8] is done only for a more convinent
> ASCII representation stats-ids in nvdimm_pref_stats[].

Sounds like it would make more sense to use char[8] everywhere, then.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]