Android Memory Overview [Part2] - Android LMKD

Introduction


In the last tutorial we saw the need and a high level overview of the Low Memory situations and different ways to understand and assign the processes a score which can be used to kill/not kill them when low mwmory conditions happen.

Android uses lmkd [LowMemoryKiller Daemon] running in userspace to determine a low memory condition and kill processes as necessary.
Let us see the details of the lmkd implementation.

Android Low Memory Killer Daemon – Detailed Walkthrough.

Android LowMemoryKillerDaemon Startup

When Android’s init process parses init.rc, We can see that the memcg [memory cgroup] is initialized
























Again, Moving further the init.rc starts the lmkd service process.


The source for the lmkd is located in platform/system/memory/lmkd.


Let us look at the main function of the lmkd with some important items marked.



There are 3 important parts in the main function, Let us understand them one by one.

1.
struct sched_param param = {
            .sched_priority = 1,
    };

2.
if (!init()) {

3.
/* CAP_IPC_LOCK required */
            if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT) && (errno != EINVAL)) {
                ALOGW("mlockall failed %s", strerror(errno));
            }

/* CAP_NICE required */
            if (sched_setscheduler(0, SCHED_FIFO, &param)) {
                ALOGW("set SCHED_FIFO failed %s", strerror(errno));
            }

Let us see the 2nd item above, Which is the init () function.


The main items of interest are
1. Create epollfd file, MAX_EPOLL_EVENTS /*
 * 1 ctrl listen socket, 3 ctrl data socket, 3 memory pressure levels,
 * 1 lmk events + 1 fd to wait for process death
 */,
2. Connect to lmkd socket and add the file handle to epollfd. EPOLLIN's handle function ctrl_connect_handler.
3. init_mp_common initializes memory pressure related parameters, creates a file handle for event notification, and adds it to epollfd. The processing function of EPOLLIN is mp_event.

init_mp_common handles memory.presure_level and create evfd for event notification of this process, and then write it into cgroup.event_control together with levelstr.

The LMK_TARGET type corresponds to cmd_targt and is used to set "/ sys / module / lowmemorykiller / parameters / minfree" and "/ sys / module / lowmemorykiller / parameters / adj".

The LMK_PROCPRIO type corresponds to cmd_procprio, which is used to write / proc / xxx / oom_score_adj and add pid to the pidhash table.
The LMK_PROCREMOVE type corresponds to cmd_procremove and is used to remove pid from pidhash.

After vmpressure reports the low event, lmkd will trigger mp_event_common to process memory pressure related events. mp_event is the processing function of low, through the kill process to release memory space.

INKERNEL_MINFREE_PATH.

The macro is /sys/module/lowmemorykiller/parameters/minfree, if we can access it, it means that the lowmemorykiller uses the driver in the kernel (there are currently two lowmemorykiller, one is the lmkd we are interested in now, and the other is the driver implemented in the kernel. The system at compile time can decide which one to use). 


Here we assume that the application layer lmk is used which uses cgroups.


init_mp_common The function initializes the memory pressure event monitoring:
The path the file is located is /dev/memcg/memory.pressure_level. Evctlfd.




ctrl_connect_handler is a handler function of lmkd socket. After accept, it will create a ctrl_data_handle. 




LMKD Client Interface

The client of the process lmkd is mainly the activity manager, which /dev/socket/lmkd communicates with lmkd through the socket . Through the previous code, we already know that when a client connects, the call is ctrl_connect_handler:

find_and_kill_process determines which adj group to find the process based on the two parameters other_free and other_file. Then look for the recently used process kill.


Lowmem_minfree and lowmem_adj are parsed from / sys / module / lowmemorykiller / parameters / minfree and / sys / module / lowmemorykiller / parameters / adj. Free up memory to reach the minimum used memory, adj from 0 to 906, each adj has a corresponding minimum memory, released gradually.

After the client is connected, via send commands to lmkd socket, this part is ctrl_command_handlerhandled. lmkd accepts following commands:


Looking at the handler function's switch statement

switch(cmd) {
    case LMK_TARGET:
        targets = nargs / 2;
        if (nargs & 0x1 || targets > (int)ARRAY_SIZE(lowmem_adj))
            goto wronglen;
        cmd_target(targets, packet);
        break;
    case LMK_PROCPRIO:
        /* process type field is optional for backward compatibility */
        if (nargs < 3 || nargs > 4)
            goto wronglen;
        cmd_procprio(packet, nargs, &cred);
        break;
    case LMK_PROCREMOVE:
        if (nargs != 1)
            goto wronglen;
        cmd_procremove(packet, &cred);
        break;
    case LMK_PROCPURGE:
        if (nargs != 0)
            goto wronglen;
        cmd_procpurge(&cred);
        break;
    case LMK_GETKILLCNT:
        if (nargs != 2)
            goto wronglen;
        kill_cnt = cmd_getkillcnt(packet);
        len = lmkd_pack_set_getkillcnt_repl(packet, kill_cnt);
        if (ctrl_data_write(dsock_idx, (char *)packet, len) != len)
            return;
        break;
    case LMK_SUBSCRIBE:
        if (nargs != 1)
            goto wronglen;
        cmd_subscribe(dsock_idx, packet);
        break;
    case LMK_PROCKILL:
        /* This command code is NOT expected at all */
        ALOGE("Received unexpected command code %d", cmd);
        break;
    default:
        ALOGE("Received unknown command code %d", cmd);
        return;
    }

Let us look at the format of the 3 major commands.
LMK_TARGET:
+---------+----------+----------------+----------+----------------+----
| lmk_cmd | minfree1 | oom_adj_score1 | minfree2 | oom_adj_score2 | ...
+---------+----------+----------------+----------+----------------+----

LMK_PROCPRIO:
+---------+-----+-----+---------+
| lmk_cmd | pid | uid | oom_adj |
+---------+-----+-----+---------+

LMK_PROCREMOVE:
+---------+-----+
| lmk_cmd | pid |
+---------+-----+




Getting the process with the heaviest size is done below



Which again is read through getting the size of each proc's statm



Finally, Its time to kill the required process or processes one by one as seen above.



CGroup memory subsystem Overview

While seeing the function init_mp_common we saw that the lmkd uses the cgroups memory pressure information to understand the current stress on the system memory.
Let us try and understand some basics of the cgroups pressure implementation.

To understand memory.pressure_level, we should know what is Memory Pressure.

The pressure_level notification can be used to monitor memory allocation costs; based on different pressure_level, different strategies are adopted to manage memory resources. 

There are three types of pressure_level:
low: The system will reclaim memory to allocate new memory.
medium: The system will use swap and swap out the active file cache to free up memory
critical: indicates that the system is already OOM or the kernel OOM is about to be triggered, and the application should take measures to free up memory space as much as possible.

The events generated after the pressure_level starts will propagate upwards until they are processed. For example, three cgroups: A-> B-> C, all have event listeners.

memory.pressure_level is only used to set eventfd, the node's read and write operations are not implemented, so no information can be obtained in sysfs. Here is an example of 

usage:
Use eventfd to create an evfd handle
Open the memory.pressure_level node mpfd
Write the string composed of "<evfd> <mpfd> <level>" to cgroup.event_control

Then if the memory pressure reaches a certain level (low / medium / critical), related applications will be notified via eventfd. Here is an implementation in lmkd:

static bool init_mp_common(enum vmpressure_level level) {
    int mpfd;
    int evfd;
    int evctlfd;
    char buf[256];
    struct epoll_event epev;
    int ret;
    int level_idx = (int)level;
    const char *levelstr = level_name[level_idx];

    /* gid containing AID_SYSTEM required */
    mpfd = open(MEMCG_SYSFS_PATH "memory.pressure_level", O_RDONLY | O_CLOEXEC);
    if (mpfd < 0) {
        ALOGI("No kernel memory.pressure_level support (errno=%d)", errno);
        goto err_open_mpfd;
    }

    evctlfd = open(MEMCG_SYSFS_PATH "cgroup.event_control", O_WRONLY | O_CLOEXEC);
    if (evctlfd < 0) {
        ALOGI("No kernel memory cgroup event control (errno=%d)", errno);
        goto err_open_evctlfd;
    }

    evfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
    if (evfd < 0) {
        ALOGE("eventfd failed for level %s; errno=%d", levelstr, errno);
        goto err_eventfd;
    }

    ret = snprintf(buf, sizeof(buf), "%d %d %s", evfd, mpfd, levelstr);
    if (ret >= (ssize_t)sizeof(buf)) {
        ALOGE("cgroup.event_control line overflow for level %s", levelstr);
        goto err;
    }

    ret = TEMP_FAILURE_RETRY(write(evctlfd, buf, strlen(buf) + 1));
    if (ret == -1) {
        ALOGE("cgroup.event_control write failed for level %s; errno=%d",
              levelstr, errno);
        goto err;
    }

    epev.events = EPOLLIN;
    /* use data to store event level */
    vmpressure_hinfo[level_idx].data = level_idx;
    vmpressure_hinfo[level_idx].handler = mp_event_common;
    epev.data.ptr = (void *)&vmpressure_hinfo[level_idx];
    ret = epoll_ctl(epollfd, EPOLL_CTL_ADD, evfd, &epev);
    if (ret == -1) {
        ALOGE("epoll_ctl for level %s failed; errno=%d", levelstr, errno);
        goto err;
    }
    maxevents++;
    mpevfd[level] = evfd;
    close(evctlfd);
    return true;

err:
    close(evfd);
err_eventfd:
    close(evctlfd);
err_open_evctlfd:
    close(mpfd);
err_open_mpfd:
    return false;
}

Let us focus on analyzing cgroup.event_control in the kernel/mm/memcontrol.c driver




During initialization of memcg , memcg_write_event_control parses the string written by lmkd, and then registers the cgroup's event processing function.

vmpressure_register_event will bind the vmpressure notification to eventfs so that lmkd will receive the vmpressure notification.
memcg: CGroup subsystem memory that requires attention to vmpressure notification
eventfd: eventfd handle to receive vmpressure notification
args: set pressure_level parameter


The Linux kernel documentation is a great place to learn about cgroup memory pressure notification so we will skip that here.


ActivityManagerService oom_adjustments information to lmkd.
The final piece in the puzzle is the oom_adj informed by ActivityManagerService to lmkd using the socket lmkd.





Points to remember while Debugging LMK/OOM situations
The configuration parameters provided by the entire framework are the entry points and can depend on the following factors.
1.     Screen Resolution of the device : minfree values needs adjustment
2.     Increase the number of adj, increase the control granularity of lowmemorykiller; or modify the adj size, change the priority of different types of processes.
3.     memory pressure levelstr, low? medium? critical? Do different treatments?
4.     Modify the condition that vmpressure triggers different levels?

Comments

  1. At the start of this article, you talk about a "last tutorial". Which tutorial was that exactly? Could you link to it?

    Thanks!

    ReplyDelete

Post a Comment

Popular posts from this blog

Android Audio Tutorial [Part Three] : AudioFlinger Introduction and Initialization

Android External Storage Support: Volume Daemon (vold) Architecture

Android Audio Tutorial [Part One] : Introduction