[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 3/3] Add support for RAPL MSRs in KVM/Qemu
From: |
Anthony Harivel |
Subject: |
Re: [PATCH v3 3/3] Add support for RAPL MSRs in KVM/Qemu |
Date: |
Tue, 12 Mar 2024 12:21:14 +0100 |
Hi Daniel, Paolo,
Here my last questions before wrapping up and send v4, or maybe call off
my attempt to add RAPL interface in QEMU.
Daniel P. Berrangé, Jan 30, 2024 at 10:39:
> > + rcu_register_thread();
> > +
> > + /* Get QEMU PID*/
> > + pid = getpid();
> > +
> > + /* Nb of CPUS per packages */
> > + maxcpus = vmsr_get_maxcpus(0);
> > +
> > + /* Nb of Physical Packages on the system */
> > + maxpkgs = vmsr_get_max_physical_package(maxcpus);
>
> This function can fail so this needs to be checked & reported.
>
> > +
> > + /* Those MSR values should not change as well */
> > + vmsr->msr_unit = vmsr_read_msr(MSR_RAPL_POWER_UNIT, 0, pid,
> > + s->msr_energy.socket_path);
> > + vmsr->msr_limit = vmsr_read_msr(MSR_PKG_POWER_LIMIT, 0, pid,
> > + s->msr_energy.socket_path);
> > + vmsr->msr_info = vmsr_read_msr(MSR_PKG_POWER_INFO, 0, pid,
> > + s->msr_energy.socket_path);
>
> This function can fail for a variety of reasons, most especially if someone
> gave an incorrect socket path, or if the daemon is not running. This is not
> getting diagnosed, and even if we try to report it here, we're in a background
> thread at this point.
>
> I think we need to connect and report errors before even starting this
> thread, so that QEMU startup gets aborted upon configuration error.
>
Fair enough. Would it be ok to do the sanity check before
rcu_register_thread() and "return NULL;" in case of error or would you
prefer me to check all of this before even calling the
qemu_thread_create() ?
> > + /* Populate all the thread stats */
> > + for (int i = 0; i < num_threads; i++) {
> > + thd_stat[i].utime = g_new0(unsigned long long, 2);
> > + thd_stat[i].stime = g_new0(unsigned long long, 2);
> > + thd_stat[i].thread_id = thread_ids[i];
> > + vmsr_read_thread_stat(&thd_stat[i], pid, 0);
>
> It is non-obvious that the 3rd parameter here is an index into
> the utime & stime array. This function would be saner to review
> if called as:
>
> vmsr_read_thread_stat(pid,
> thd_stat[i].thread_id,
> &thd_stat[i].utime[0],
> &thd_stat[i].stime[0],
> &thd_stat[i].cpu_id);
>
> so we see what are input parameters and what are output parameters.
>
> Also this method can fail, eg if the thread has exited already,
> so we need to take that into account and stop trying to get info
> for that thread in later code. eg by setting 'thread_id' to 0
> and then skipping any thread_id == 0 later.
>
>
Good point. I'll rework the function and return "thread_id" to 0 in
case of failure in order to test it later on.
> > + thd_stat[i].numa_node_id =
> > numa_node_of_cpu(thd_stat[i].cpu_id);
> > + }
> > +
> > + /* Retrieve all packages power plane energy counter */
> > + for (int i = 0; i <= maxpkgs; i++) {
> > + for (int j = 0; j < num_threads; j++) {
> > + /*
> > + * Use the first thread we found that ran on the CPU
> > + * of the package to read the packages energy counter
> > + */
> > + if (thd_stat[j].numa_node_id == i) {
>
> 'i' is a CPU ID value, while 'numa_node_id' is a NUMA node ID value.
> I don't think it is semantically valid to compare them for equality.
>
> I'm not sure the NUMA node is even relevant, since IIUC from the docs
> earlier, the power values are scoped per package, which would mean per
> CPU socket.
>
'i' here is the package number on the host.
I'm using functions of libnuma to populate the maxpkgs of the host.
I tested this on different Intel CPU with multiple packages and this
has always returned the good number of packages. A false positive ?
So here I'm checking if the thread has run on the package number 'i'.
I populate 'numa_node_id' with numa_node_of_cpu().
I did not wanted to reinvent the wheel and the only lib that was talking
about "node" was libnuma.
Maybe I'm wrong assuming that a "node" (defined as an area where all
memory has the same speed as seen from a particular CPU) could lead me
to the packages number ?
And this is what I see you wrote below:
"A numa node isn't a package AFAICT."
Regards,
Anthony