LINUX & HPC : Advanced Large Scale Computing at a Glance !: Non-Uniform Memory Access Architecture

Non-Uniform Memory Access (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the distance between the CPU and the memory. In a NUMA system, each CPU has access to its own local memory as well as remote memory, which can cause performance issues if not managed properly.

In the Non-Uniform Memory Access (NUMA) architecture, the path from processor to memory is non-uniform. This organization enables the construction of systems with a large number of processors, and hence the association with very large systems. A NUMA system with cache-coherent memory running a single OS image is still an SMP system. A general representation of the NUMA system is shown below.

NUMA Architecture

The system is comprised of multiple nodes each with 2-4 processors, a memory controller, memory and perhaps IO. There might be a separate node controller, or the MC and NC could be integrated. The nodes could be connected by a shared bus, or may implement a cross-bar.

By classifying memory location bases on signal path length from the processor to the memory, latency and bandwidth bottlenecks can be avoided. This is done by redesigning the whole system of processor and chipset. AMD Opteron family was introduced featuring integrated memory controllers with each CPU owning designated memory banks. Each CPU has now its own memory address space. A NUMA optimized operating system such as ESXi allows workload to consume memory from both memory addresses spaces while optimizing for local memory access. Let’s use an example of a two CPU system to clarify the distinction between local and remote memory access within a single system.

Source

The memory connected to the memory controller of the CPU1 is considered to be local memory. Memory connected to another CPU socket (CPU2)is considered to be foreign or remote for CPU1. Remote memory access has additional latency overhead to local memory access, as it has to traverse an interconnect (point-to-point link) and connect to the remote memory controller. As a result of the different memory locations, this system experiences “non-uniform” memory access time.

HP Prema:

A node comprises four processors, memory, IO and a pair of node controllers. There are three IOH devices in the system. The processors and memory on the CPU board are connected to the XNC board with the node controllers.

Source

The "isolcpus" kernel parameter is used to isolate one or more CPUs from the kernel scheduler. This is typically used for running real-time or high-performance applications that require dedicated CPU resources. However, it does not have any direct relationship with NUMA nodes.

vi /etc/default/grub

Find the line that starts with "GRUB_CMDLINE_LINUX_DEFAULT" and add "numa=off" to the end of the line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash numa=off"

Regenerate the GRUB configuration file by running the following command:

update-grub

grub2-mkconfig -o /boot/grub2/grub.cfg

Reboot the system for the changes to take effect

Note that after disabling NUMA, the system will treat all memory as a single, uniform memory pool. This may not always improve performance

---------------

Certainly, here's an example of how to use the "isolcpus" command to isolate CPU cores from the kernel scheduler:

Find out the number of available CPU cores by running:

cat /proc/cpuinfo | grep processor | wc -l

To isolate one or more CPU cores from the kernel scheduler, append the "isolcpus" parameter to the kernel boot command in the GRUB configuration file. For example, to isolate CPU core 0, edit the GRUB configuration file by running:

/etc/default/grub

Add the "isolcpus" parameter followed by the CPU core number(s) to the end of the "GRUB_CMDLINE_LINUX_DEFAULT" line. For example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=0"

If you want to isolate multiple cores, separate them with commas. For example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=0,2"

save and update-grub

Reboot the system for the changes to take effect.

After isolating the specified CPU cores with the "isolcpus" parameter, you can assign them to a specific process using the "taskset" command. For example, to run a process on CPU core 0, run:

taskset -c 0 <command>

Note that isolating CPU cores can affect system performance, so it's important to test your application's performance before and after isolating CPU cores to see if it has any effect on performance.

# cat /proc/cpuinfo | grep processor | wc -l
16
# taskset -c 0 hostname
host1
# taskset -c 18 hostname
taskset: failed to set pid 11896's affinity: Invalid argument
# taskset -c 16 hostname
taskset: failed to set pid 12131's affinity: Invalid argument
# taskset -c 15 hostname
host1
# taskset -c 11,15 hostname
host1
#

on PPC arch : /etc/grub.conf

Find the line that starts with "append" and add "isolcpus" parameter followed by the CPU core number(s) to the end of the line. For example

append="quiet splash isolcpus=0,2"

Ater isolating the specified CPU cores with the "isolcpus" parameter, you can assign them to a specific process using the "taskset" command

check your kernel supports isolcpus:

grep -i isolcpus /boot/config-$(uname -r)

This command searches for the "isolcpus" parameter in the kernel configuration file for the currently running kernel.

If the output of the command shows a line that looks like this:

CONFIG_ISOLCPU_PROC=y

then your kernel supports isolating CPU cores with the "isolcpus" parameter.

Some Linux distributions may not include the kernel configuration file in the /boot directory. In that case, you may need to install the "kernel-devel" or "kernel-source" package to access the kernel configuration file

NOTE: Another way of making CPU offline:
echo 0 > /sys/devices/system/cpu/cpu7/online (to offline cpu)
change cpu number with required cpu

Source

Simultaneous multithreading (SMT) is a processor design that combines hardware multithreading with superscalar processor technology. Simultaneous multithreading can use multiple threads to issue instructions each cycle.

Example: How enable SMT and check on power architecture(PPC):

# cat smt.sh
while [ 1 ]
do
ppc64_cpu --smt=off
ppc64_cpu --smt
ppc64_cpu --smt=on
ppc64_cpu --smt
ppc64_cpu --smt=2
ppc64_cpu --smt
ppc64_cpu --smt=4
ppc64_cpu --smt
done

-----------------------------------END--------------------------------------------------------------

LINUX & HPC : Advanced Large Scale Computing at a Glance !

Tuesday, March 28, 2023

Non-Uniform Memory Access Architecture

No comments:

Post a Comment

Popular Posts

Translate