Auto-tuning the Linux kernel with bpftune
[Tue Dec 24 13:32:44 CST 2024]

Thanks to Linux Weekly News, I recently learned about auto-tuning the Linux kernel using a tool called bpftune developed by Oracle. The project is currently focused on optimizing network settings, but the authors hope to extend it to other areas of the kernel too. If anything, my main concern would be that, although released under the GPLv2 license, this is Oracle we are talking about. It's not as if they have a very solid record when it comes to licensing issues. Also, as mentioned before, it currently allows for tweaking the network tunable parameters only, and it is not packaged for any major distro other than Oracle Linux. The list of available tuners so far are:

  • ip_frag, which manages IP fragmentation settings
  • net_buffer, which manages non-TCP buffer sizes
  • route_table, which manages routing settings
  • tcp_buffer, which manages TCP buffer sizes
  • neigh_table, which manages the neighbor (ARP) tables
  • netns, which manages network namespaces
  • tcp_conn, which chooses TCP congestion-control algorithms for each connection
  • sysctl, which reacts to changes to sysctl settings
{link to this entry}

Concurrency vs. parallelism
[Mon Nov 4 08:49:35 CST 2024]

Although they are easily confused, there is a difference between concurrency and parallelism:

Concurrency relates to an application that is processing more than one task at the same time. Concurrency is an approach that is used for decreasing the response time of the system by using the single processing unit. Concurrency creates the illusion of parallelism, however actually the chunks of a task aren’t parallelly processed, but inside the application, there are more than one task is being processed at a time. It doesn’t fully end one task before it begins ensuing.

In other words, concurrency is achieved by interleaving the operations inside the single processor by context switching.

On the other hand, parallelism does something quite different:

Parallelism is related to an application where tasks are divided into smaller sub-tasks that are processed seemingly simultaneously or parallel. It is used to increase the throughput and computational speed of the system by using multiple processors. It enables single sequential CPUs to do lot of things “seemingly” simultaneously.

Parallelism leads to overlapping of central processing units and input-output tasks in one process with the central processing unit and input-output tasks of another process. Whereas in concurrency the speed is increased by overlapping the input-output activities of one process with CPU process of another process.

The Wikipedia entry for parallel computing does an excellent job explaining the difference too:

In computer science, parallelism and concurrency are two different things: a parallel program uses multiple CPU cores, each core performing a task independently. On the other hand, concurrency enables a program to deal with multiple tasks even on a single CPU core; the core switches between tasks (i.e. threads) without necessarily completing each one. A program can have both, neither or a combination of parallelism and concurrency characteristics.

That's where things like MPI come into the picture.

In any case, although parallel computing may sound like the panacea of high-performance computing, and it certainly is a key element, thatdoesn't mean it solves all our problems, and it scales perfectly well, as the Wikipedia entry goes on to explain:

Optimally, the speedup from parallelization would be linear—doubling the number of processing elements should halve the runtime, and doubling it a second time should again halve the runtime. However, very few parallel algorithms achieve optimal speedup. Most of them have a near-linear speedup for small numbers of processing elements, which flattens out into a constant value for large numbers of processing elements.

{link to this entry}

Real-Time makes it to the Linux kernel
[Mon Nov 4 08:15:48 CST 2024]

After so many years, real-time finally made it to the Linux kernel. Well, it has been there for quite sometime, but not fully, and not as part of the mainline kernel. But first, what is a real-time operating system?

An RTOS is a specialized operating system designed to handle time-critical tasks with precision and reliability. Unlike general-purpose operating systems like Windows or macOS, an RTOS is built to respond to events and process data within strict time constraints, often measured in milliseconds or microseconds. As Steven Rostedt, a prominent real-time Linux developer and Google engineer, put it, "Real-time is the fastest worst-case scenario."

He means that the essential characteristic of an RTOS is its deterministic behavior. An RTOS guarantees that critical tasks will be completed within specified deadlines. Many people assume that RTOSs are for fast processes. They're not. Speed is not the point in RTOSs -- reliability is. This predictability is crucial in applications where timing is essential, such as industrial control systems, medical devices, and aerospace equipment.

This is certainly really useful to people involved in digital multimedia creation. A real-time kernel, for instance, will avoid skipping when recording a song. Plenty of people confuse this (i.e., reliability) with speeed. It's something similar to the confusion between reliability and stability when it comes to system administration.

In any case, although the real-time code has finally been completely merged into the 6.12 Linux kernel, it has been slowly making its way there for quite a while:

The story of real-time Linux began in the late 1990s when there was a growing need for Linux to support real-time applications. The initial efforts focused on creating separate real-time kernels that ran alongside the Linux kernel. This included academic projects such as KURT from the University of Kansas; RTAI, from the University of Milano; and New Mexico Institute of Mining and Technology's RTLinux.

Ingo Molnar, a senior Linux kernel developer, started collecting and reshaping pieces of these technologies in 2004 to build the foundation for the real-time preemption patch set PREEMPT_RT.

This approach was different from earlier real-time Linux solutions as it modified the existing Linux kernel rather than creating a separate real-time kernel. By 2006, it had gained enough traction that Linus Torvalds observed, "Controlling a laser with Linux is crazy, but everyone in this room is crazy in his own way. So if you want to use Linux to control an industrial welding laser, I have no problem with you using PREEMPT_RT."

(...)

As the project moved forward, many elements of it moved into the kernel. Rostedt told me that, in a way, it's wrong to say that real-time is only now in Linux. Many of its features have been introduced into mainstream Linux over the years. Some of these, indeed, are essential to the Linux you use every day.

For example, chances are you've never heard of "NO_HZ," which reduces power consumption in idle systems. NO_HZ is what enables Linux to run efficiently on machines with thousands of CPUs. "You don't realize how much Linux improved because of the real-time patch," Rostedt emphasized. "The only reason why Linux runs in data centers today is because of the work we did."

So, without NO_HZ, Linux wouldn't be running essentially all data centers. This, in turn, explains why Linux runs the cloud. I don't know exactly what the world would look like without this real-time contribution, but it wouldn't look anything like it does today.

And finally, after all these years, the Linux kernel developers managed to overcome a final obstacle:

The final hurdle for full integration was reworking the kernel's print_k function, a critical debugging tool dating back to 1991. Torvalds was particularly protective of printk --He wrote the original code and still uses it for debugging. However, printk also puts a hard delay in a Linux program whenever it's called. That kind of slowdown is unacceptable in real-time systems.

Rostedt explained: "Printk has a thousand hacks to handle a thousand different situations. Whenever we modified printk to do something, it would break one of these cases. The thing about printk that's great about debugging is you can know exactly where you were when a process crashed. When I would be hammering the system really, really hard, and the latency was mostly around maybe 30 microseconds, and then suddenly it would jump to five milliseconds." That delay was the printk message.

After much work, many heated discussions, and several rejected proposals, a compromise was reached earlier this year. Torvalds is happy, the real-time Linux developers are happy, printk users are happy, and, at long last, real-time Linux is real.

With that, the real-time code is now an integral part of the Linux kernel. So, what's next for projects like HPE REACT (and old SGI product)? Well, first of all, the patches (and, therefore, the functionality) have been there all along. What's news now is that the final hurdle (i.e., printk) has been overcome. Then, there is the issue that we don't truly know whether RT_PREEMPT can guarantee the same low latency as other alternatives. And, finally, we have the fact that some of these products also offer an easy API to program to. In the case of HPE REACT, it offers the Frame Rate Scheduler (FRS). {link to this entry}

A counter for hung tasks since boot
[Wed Oct 30 23:33:08 UTC 2024]

New proposal for a counter in the Linux kernel to keep track of the number of hung tasks since the most recent boot.

Via /proc/sys/kernel/hung_task_warnings is already the ability to read the number of hung task warnings and the like. With a set of two patches now under review, /proc/sys/kernel/hung_task_detect_count would be added to report the total number of hung tasks detected since boot time.

Apparently, the patch is only 18 lines of code. Anyways, I've come across cases where the Linux kernel detects hung tasks and reports it via syslog. It is possible to configure the system to panic on a hung task, but that's not always a good idea. In general, users just notice that the system becomes unresponsive. This might prove useful to troubleshoot issues. {link to this entry}