[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 3/3] Add support for RAPL MSRs in KVM/Qemu
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH v3 3/3] Add support for RAPL MSRs in KVM/Qemu |
Date: |
Tue, 12 Mar 2024 15:49:12 +0000 |
User-agent: |
Mutt/2.2.12 (2023-09-09) |
On Tue, Mar 12, 2024 at 12:21:14PM +0100, Anthony Harivel wrote:
> Daniel P. Berrangé, Jan 30, 2024 at 10:39:
> > > + rcu_register_thread();
> > > +
> > > + /* Get QEMU PID*/
> > > + pid = getpid();
> > > +
> > > + /* Nb of CPUS per packages */
> > > + maxcpus = vmsr_get_maxcpus(0);
> > > +
> > > + /* Nb of Physical Packages on the system */
> > > + maxpkgs = vmsr_get_max_physical_package(maxcpus);
> >
> > This function can fail so this needs to be checked & reported.
> >
> > > +
> > > + /* Those MSR values should not change as well */
> > > + vmsr->msr_unit = vmsr_read_msr(MSR_RAPL_POWER_UNIT, 0, pid,
> > > + s->msr_energy.socket_path);
> > > + vmsr->msr_limit = vmsr_read_msr(MSR_PKG_POWER_LIMIT, 0, pid,
> > > + s->msr_energy.socket_path);
> > > + vmsr->msr_info = vmsr_read_msr(MSR_PKG_POWER_INFO, 0, pid,
> > > + s->msr_energy.socket_path);
> >
> > This function can fail for a variety of reasons, most especially if someone
> > gave an incorrect socket path, or if the daemon is not running. This is not
> > getting diagnosed, and even if we try to report it here, we're in a
> > background
> > thread at this point.
> >
> > I think we need to connect and report errors before even starting this
> > thread, so that QEMU startup gets aborted upon configuration error.
> >
>
> Fair enough. Would it be ok to do the sanity check before
> rcu_register_thread() and "return NULL;" in case of error or would you
> prefer me to check all of this before even calling the
> qemu_thread_create() ?
I think it is best to initialize in kvm_msr_energy_thread_init(),
prior to thread creation, as that avoids some complexity cleaning
up and reporting errors.
> > > + thd_stat[i].numa_node_id =
> > > numa_node_of_cpu(thd_stat[i].cpu_id);
> > > + }
> > > +
> > > + /* Retrieve all packages power plane energy counter */
> > > + for (int i = 0; i <= maxpkgs; i++) {
> > > + for (int j = 0; j < num_threads; j++) {
> > > + /*
> > > + * Use the first thread we found that ran on the CPU
> > > + * of the package to read the packages energy counter
> > > + */
> > > + if (thd_stat[j].numa_node_id == i) {
> >
> > 'i' is a CPU ID value, while 'numa_node_id' is a NUMA node ID value.
> > I don't think it is semantically valid to compare them for equality.
> >
> > I'm not sure the NUMA node is even relevant, since IIUC from the docs
> > earlier, the power values are scoped per package, which would mean per
> > CPU socket.
> >
>
> 'i' here is the package number on the host.
Opps, yes, 'maxpkgs' is iterating over CPU /socket/ ID numbers
The point still stands though. NUMA node ID numbers are not
guaranteed to be the same as socket ID numbers. Very often
then will be the same (which makes it annoying to test as it
is easy to not realize the difference), but we can't rely on
that.
> I'm using functions of libnuma to populate the maxpkgs of the host.
> I tested this on different Intel CPU with multiple packages and this
> has always returned the good number of packages. A false positive ?
maxpkgs comes from vmsr_get_max_physical_package() which you're
reading from sysfs, rather than libnuma.
> So here I'm checking if the thread has run on the package number 'i'.
> I populate 'numa_node_id' with numa_node_of_cpu().
>
> I did not wanted to reinvent the wheel and the only lib that was talking
> about "node" was libnuma.
I'm not actually convinced we need to use libnuma at all. IIUC, you're
just trying to track all CPUs within the same physical socket (package).
I don't think we need to care about NUMA nodes to do that tracking.
> Maybe I'm wrong assuming that a "node" (defined as an area where all
> memory has the same speed as seen from a particular CPU) could lead me
> to the packages number ?
Historically you could have multiple sockets in the same NUMA node
ie a m:1 mapping.
These days with AMD sockets, you can have 1 socket compromising
many NUMA nodes, as individual dies within a socket are each their
own NUMA node. So a 1:m mapping
On Intel I think it is still typical to have 1 socket per numa
node, but again I don't think we can rely on that 1:1 mapping.
Fortunately I don't think it matters, since it looks like you
don't really need to track NUMA nodes, only sockets (phnysical
package IDs)
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|