0
点赞
收藏
分享

微信扫一扫

watchdog

生态人 2022-05-04 阅读 51
javaandroid

目录

一  watchdog的创建

二 watchdog的成员和初始化

2.1 理解HandlerChecker

2.2 mMonitorChecker

2.3 mCompleted

2.3 watchdog的初始化

2.4 watchdog的run 方法


一  watchdog的创建

 /frameworks/base/services/java/com/android/server/SystemServer.java

 private void startBootstrapServices(@NonNull TimingsTraceAndSlog t) {
t.traceBegin("startBootstrapServices");

// Start the watchdog as early as possible so we can crash the system server
// if we deadlock during early boot
t.traceBegin("StartWatchdog");
final Watchdog watchdog = Watchdog.getInstance();
watchdog.start();
t.traceEnd();
...
...
}

systemserver 进程在启动是会创建watchdog 实例,并启动这个线程。

    public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}

return sWatchdog;
}

watchdog的实例只会有一个。

二 watchdog的成员和初始化

    private final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();
private final HandlerChecker mMonitorChecker;

这里列举两个比较重要的成员变量,一个是HandlerChecker 类的list,一个是HandlerChecker对象。

2.1 理解HandlerChecker

HandlerChecker 是WatchDog的内部类,实现了Runnable 接口,这个类又维护了Monitor 类的list

        private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
private final ArrayList<Monitor> mMonitorQueue = new ArrayList<Monitor>();
private boolean mCompleted;

Monitor 是一个接口,这个接口只有一个方法,如果想要实现这个接口,必须要实现monitor()方法。

2.2 mMonitorChecker

这里单独维护了一个HandlerChecker 对象,实际上是为了fg.thread 线程所用。

2.3 mCompleted

用来标记被监测的对象是否已经完成了monitor 方法。

2.3 watchdog的初始化

    private Watchdog() {
super("watchdog");
// Initialize handler checkers for each common thread we want to check. Note
// that we are not currently checking the background thread, since it can
// potentially hold longer running operations with no guarantees about the timeliness
// of operations there.

// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// And the display thread.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
// And the animation thread.
mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(),
"animation thread", DEFAULT_TIMEOUT));
// And the surface animation thread.
mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(),
"surface animation thread", DEFAULT_TIMEOUT));

// Initialize monitor for Binder threads.
addMonitor(new BinderThreadMonitor());

mOpenFdMonitor = OpenFdMonitor.create();

mInterestingJavaPids.add(Process.myPid());

// See the notes on DEFAULT_TIMEOUT.
assert DB ||
DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
}
  • 将mMonitorChecker 成员变量初始化,并加入到HandlerChecker 类的list
  • 将需要监测的线程加入到HandlerChecker 类的list

2.4 watchdog的run 方法

  • 整个run 方法其实一个死循环 
  • 开始让每一个HandlerChecker 都开始执行scheduleCheckLocked,看下这个函数
        public void scheduleCheckLocked() {
if (mCompleted) {
// Safe to update monitors in queue, Handler is not in the middle of work
mMonitors.addAll(mMonitorQueue);
mMonitorQueue.clear();
}
if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())
|| (mPauseCount > 0)) {
// Don't schedule until after resume OR
// If the target looper has recently been polling, then
// there is no reason to enqueue our checker on it since that
// is as good as it not being deadlocked. This avoid having
// to do a context switch to check the thread. Note that we
// only do this if we have no monitors since those would need to
// be executed at this point.
mCompleted = true;
return;
}
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}

mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}

设置mCompleted这个变量;将当前的runnable 对象放到messagequeue,之后执行它的run 方法,实际上也就是执行HandlerChecker的run方法

  • 再来看HandlerChecker的run方法
        @Override
public void run() {
// Once we get here, we ensure that mMonitors does not change even if we call
// #addMonitorLocked because we first add the new monitors to mMonitorQueue and
// move them to mMonitors on the next schedule when mCompleted is true, at which
// point we have completed execution of this method.
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}

synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}

执行每一个加入mMonitors的monitor 方法,前面我们叙述过,fg 线程中会有很多monitor对象。

  •  watchdog 线程wait 30s,30s之后监测每一个HandlerChecker
    private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
        public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}

这里用时间来判断,小于30s则继续等待,小于60s,说明已经到超时的一半了。

        public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}

开始通知AMS dump了

watchdog 继续监测30s,这时候如果有超时的HandlerChecker,就把他加入到block HandlerChecker里面

    private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
if (hc.isOverdueLocked()) {
checkers.add(hc);
}
}
return checkers;
}

系统的线程处理时间超过60s了,说明系统已经hang住了,按照android的机制就需要重启

  • 通知AMS 开始dump
  • 用sysrq打印kernel里block的线程
  • 输出到dropbox

最后就是重启。

举报

相关推荐

0 条评论