Android Memory Overview [Part2] - Android LMKD
Introduction
In the last tutorial we saw the need and a
high level overview of the Low Memory situations and different ways to understand
and assign the processes a score which can be used to kill/not kill them when
low mwmory conditions happen.
Android uses lmkd [LowMemoryKiller Daemon]
running in userspace to determine a low memory condition and kill processes as
necessary.
Let
us see the details of the lmkd implementation.
Android Low Memory Killer Daemon – Detailed Walkthrough.
Android LowMemoryKillerDaemon Startup
When
Android’s init process parses init.rc, We can see that the memcg [memory
cgroup] is initialized
Again,
Moving further the init.rc starts the lmkd service process.
The source for the lmkd is located in platform/system/memory/lmkd.
Let
us look at the main function of the lmkd with some important items marked.
There
are 3 important parts in the main function, Let us understand them one by one.
1.
struct
sched_param param = {
.sched_priority = 1,
};
2.
if
(!init()) {
3.
/*
CAP_IPC_LOCK required */
if
(mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT) && (errno != EINVAL))
{
ALOGW("mlockall failed
%s", strerror(errno));
}
/*
CAP_NICE required */
if (sched_setscheduler(0, SCHED_FIFO, ¶m)) {
ALOGW("set SCHED_FIFO failed %s",
strerror(errno));
}
The
main items of interest are
1. Create epollfd file, MAX_EPOLL_EVENTS /*
* 1 ctrl listen socket, 3
ctrl data socket, 3 memory pressure levels,
* 1 lmk events + 1 fd to
wait for process death
*/,
2. Connect to lmkd socket and add the file handle to epollfd.
EPOLLIN's handle function ctrl_connect_handler.
3. init_mp_common initializes memory pressure related parameters, creates
a file handle for event notification, and adds it to epollfd. The processing
function of EPOLLIN is mp_event.
init_mp_common
handles memory.presure_level and create evfd for event notification of this
process, and then write it into cgroup.event_control together with levelstr.
The LMK_TARGET type corresponds to cmd_targt and is used to set "/ sys / module / lowmemorykiller / parameters / minfree" and "/ sys / module / lowmemorykiller / parameters / adj".
The LMK_PROCPRIO type corresponds to cmd_procprio, which is used to write / proc / xxx / oom_score_adj and add pid to the pidhash table.
The LMK_PROCREMOVE type corresponds to cmd_procremove and is used to remove pid from pidhash.
After vmpressure reports the low event, lmkd will trigger mp_event_common to process memory pressure related events. mp_event is the processing function of low, through the kill process to release memory space.
INKERNEL_MINFREE_PATH.
The macro is /sys/module/lowmemorykiller/parameters/minfree, if we can access it, it means that the lowmemorykiller uses the driver in the kernel (there are currently two lowmemorykiller, one is the lmkd we are interested in now, and the other is the driver implemented in the kernel. The system at compile time can decide which one to use).
Here we assume that the application layer lmk is used which uses cgroups.
init_mp_common The function initializes the memory pressure event monitoring:
The path the file is located is /dev/memcg/memory.pressure_level. Evctlfd.
ctrl_connect_handler
is a handler function of lmkd socket. After accept, it will create a ctrl_data_handle.
LMKD Client Interface
The client of the process lmkd is mainly the activity manager, which /dev/socket/lmkd communicates with lmkd through the socket . Through the previous code, we already know that when a client connects, the call is ctrl_connect_handler:
find_and_kill_process determines which adj group to find the process based on the two parameters other_free and other_file. Then look for the recently used process kill.
Lowmem_minfree and lowmem_adj are parsed from / sys / module / lowmemorykiller / parameters / minfree and / sys / module / lowmemorykiller / parameters / adj. Free up memory to reach the minimum used memory, adj from 0 to 906, each adj has a corresponding minimum memory, released gradually.
After the client is connected, via send commands to lmkd socket, this part is ctrl_command_handlerhandled. lmkd accepts following commands:
Looking at the handler function's switch statement
switch(cmd) {
case LMK_TARGET:
targets = nargs / 2;
if (nargs & 0x1
|| targets > (int)ARRAY_SIZE(lowmem_adj))
goto wronglen;
cmd_target(targets,
packet);
break;
case LMK_PROCPRIO:
/* process type
field is optional for backward compatibility */
if (nargs < 3 ||
nargs > 4)
goto wronglen;
cmd_procprio(packet,
nargs, &cred);
break;
case LMK_PROCREMOVE:
if (nargs != 1)
goto wronglen;
cmd_procremove(packet,
&cred);
break;
case LMK_PROCPURGE:
if (nargs != 0)
goto wronglen;
cmd_procpurge(&cred);
break;
case LMK_GETKILLCNT:
if (nargs != 2)
goto wronglen;
kill_cnt =
cmd_getkillcnt(packet);
len =
lmkd_pack_set_getkillcnt_repl(packet, kill_cnt);
if
(ctrl_data_write(dsock_idx, (char *)packet, len) != len)
return;
break;
case LMK_SUBSCRIBE:
if (nargs != 1)
goto wronglen;
cmd_subscribe(dsock_idx, packet);
break;
case LMK_PROCKILL:
/* This command code
is NOT expected at all */
ALOGE("Received
unexpected command code %d", cmd);
break;
default:
ALOGE("Received
unknown command code %d", cmd);
return;
}
Let us look at the format of the 3 major commands.
LMK_TARGET:
+---------+----------+----------------+----------+----------------+----
| lmk_cmd | minfree1 | oom_adj_score1 | minfree2 | oom_adj_score2
| ...
+---------+----------+----------------+----------+----------------+----
LMK_PROCPRIO:
+---------+-----+-----+---------+
| lmk_cmd | pid | uid | oom_adj |
+---------+-----+-----+---------+
LMK_PROCREMOVE:
+---------+-----+
| lmk_cmd | pid |
+---------+-----+
Getting the process with the heaviest size is done below
Which again is read through getting the size of each proc's statm
Finally, Its time to kill the required process or processes one by one as seen above.
CGroup memory subsystem Overview
While seeing the function init_mp_common
we saw that the lmkd uses the cgroups memory pressure information to understand
the current stress on the system memory.
Let us try and understand some basics of the cgroups pressure
implementation.
To understand memory.pressure_level, we should know what is Memory
Pressure.
The pressure_level notification can be used to monitor memory
allocation costs; based on different pressure_level, different strategies are
adopted to manage memory resources.
There are three types of
pressure_level:
low: The system will reclaim memory to
allocate new memory.
medium: The system will use swap and swap out
the active file cache to free up memory
critical: indicates that the system is already OOM
or the kernel OOM is about to be triggered, and the application should take
measures to free up memory space as much as possible.
The events generated after the pressure_level starts will
propagate upwards until they are processed. For example, three cgroups:
A-> B-> C, all have event listeners.
memory.pressure_level is only used to set eventfd, the node's read
and write operations are not implemented, so no information can be obtained in
sysfs. Here is an example of
usage:
Use eventfd to create an evfd handle
Open the memory.pressure_level node mpfd
Write the string composed of "<evfd> <mpfd>
<level>" to cgroup.event_control
Then if the memory pressure reaches a certain level (low / medium
/ critical), related applications will be notified via eventfd. Here is an
implementation in lmkd:
static bool init_mp_common(enum vmpressure_level level) {
int mpfd;
int evfd;
int evctlfd;
char buf[256];
struct epoll_event epev;
int ret;
int level_idx = (int)level;
const char *levelstr = level_name[level_idx];
/* gid containing AID_SYSTEM required */
mpfd = open(MEMCG_SYSFS_PATH "memory.pressure_level", O_RDONLY | O_CLOEXEC);
if (mpfd < 0) {
ALOGI("No kernel memory.pressure_level support (errno=%d)", errno);
goto err_open_mpfd;
}
evctlfd = open(MEMCG_SYSFS_PATH "cgroup.event_control", O_WRONLY | O_CLOEXEC);
if (evctlfd < 0) {
ALOGI("No kernel memory cgroup event control (errno=%d)", errno);
goto err_open_evctlfd;
}
evfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
if (evfd < 0) {
ALOGE("eventfd failed for level %s; errno=%d", levelstr, errno);
goto err_eventfd;
}
ret = snprintf(buf, sizeof(buf), "%d %d %s", evfd, mpfd, levelstr);
if (ret >= (ssize_t)sizeof(buf)) {
ALOGE("cgroup.event_control line overflow for level %s", levelstr);
goto err;
}
ret = TEMP_FAILURE_RETRY(write(evctlfd, buf, strlen(buf) + 1));
if (ret == -1) {
ALOGE("cgroup.event_control write failed for level %s; errno=%d",
levelstr, errno);
goto err;
}
epev.events = EPOLLIN;
/* use data to store event level */
vmpressure_hinfo[level_idx].data = level_idx;
vmpressure_hinfo[level_idx].handler = mp_event_common;
epev.data.ptr = (void *)&vmpressure_hinfo[level_idx];
ret = epoll_ctl(epollfd, EPOLL_CTL_ADD, evfd, &epev);
if (ret == -1) {
ALOGE("epoll_ctl for level %s failed; errno=%d", levelstr, errno);
goto err;
}
maxevents++;
mpevfd[level] = evfd;
close(evctlfd);
return true;
err:
close(evfd);
err_eventfd:
close(evctlfd);
err_open_evctlfd:
close(mpfd);
err_open_mpfd:
return false;
}
During initialization of memcg , memcg_write_event_control parses
the string written by lmkd, and then registers the cgroup's event processing
function.
vmpressure_register_event will bind the
vmpressure notification to eventfs so that lmkd will receive the vmpressure
notification.
memcg: CGroup subsystem memory that requires attention to
vmpressure notification
eventfd: eventfd handle to receive vmpressure notification
args: set pressure_level parameter
The Linux kernel documentation is a great place to learn about cgroup memory pressure notification so we will skip that here.
ActivityManagerService oom_adjustments information to lmkd.
The final piece in the puzzle is the
oom_adj informed by ActivityManagerService to lmkd using the socket lmkd.
Points to remember while Debugging LMK/OOM situations
The configuration parameters provided by the entire framework are
the entry points and can depend on the following factors.
1. Screen Resolution of the device : minfree
values needs adjustment
2. Increase the number of adj, increase the
control granularity of lowmemorykiller; or modify the adj size, change the
priority of different types of processes.
3. memory pressure levelstr, low? medium? critical? Do
different treatments?
4. Modify the condition that vmpressure
triggers different levels?
At the start of this article, you talk about a "last tutorial". Which tutorial was that exactly? Could you link to it?
ReplyDeleteThanks!