MetricKIt监控APP性能

目前的性能框架

MetricKit简介

MetricKit是苹果在iOS13(2020)年推出的性能监测框架。session
[apple文档](https://developer.apple.com/documentation/metrickit?language=objc)
iOS14进行了优化。
使用MetricKit,您可以接收系统捕获的设备应用程序诊断以及电源和性能指标。该系统每天最多向注册的应用程序发送一次关于前24小时的度量报告,
并在iOS 15及更高版本和macOS 12及更高版中立即发送诊断报告。

MetricKit 将系统收集的数据交给开发者,让我们自己去决定如何利用这些数据去打造一个更省电、性能更好的 App。

该框架包括以下内容:
管理器类和订户协议
报告数据的有效载荷类别
每类度量和诊断的类
测量单位的类别,如蜂窝连接条
用于表示累积数据(如直方图)的类
用于在诊断中捕获堆栈跟踪的类

如何集成MetricKit

MetricKit 的接入基本属于无侵入,并且步骤十分简单

  1. 获取 MXMetricManager 单例
  2. 注册数据接收对象
  3. 实现 delegate 回调
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    import UIKit
    import MetricKit


    @objc class QGMetricKitManager: NSObject {
    @objc public static let shared = QGMetricKitManager()
    private override init() {

    }

    @available(iOS 13.0, *)
    @objc func start() {
    let metricManager = MXMetricManager.shared
    metricManager.add(self);
    }
    }

    extension QGMetricKitManager: MXMetricManagerSubscriber {
    @available(iOS 13.0, *)
    func didReceive(_ payloads: [MXMetricPayload]) {
    guard let firtPayload = payloads.first else { return }
    print(firtPayload.dictionaryRepresentation())
    }
    @available(iOS 14.0, *)
    func didReceive(_ payloads: [MXDiagnosticPayload]) {
    guard let firtPayload = payloads.first else { return }
    print(firtPayload.dictionaryRepresentation())
    }
    }

    MetricKit如何工作

    MetricKit 会在用户使用 App 的时候同步收集诊断信息,然后每天结束的时候将他们打包成 MXDiagnosticPayload 给到我们。这样我们就拥有了同一个时间段的性能数据和诊断数据。 由于他们是一一对应的,所以我们在对性能数据产生疑问的时候就可以掏出对应的诊断数据来进行排查。由于有了这些一一的对应关系,所以 MetricKit 2.0 也随之来了一些新的基类
  • MXDiagnostic :所有诊断类集成的基类
  • MXDiagnosticPayload :诊断包,包含一天结束时的所有诊断
  • MXCallStackTree :新数据类,用于封装当前环境的调用栈
    MXCallStackTree 封装的调用栈并没有经过符号化,旨在用于设备外处理,非 debug 用。转换成 JSON 后如下所示 如果想要了解怎么利用这些调用栈数据,可以观看 WWDC 2020 - 10057 Identify trends with the Power and Performance API[4].sessions

    简单介绍下诊断类型

    挂起异常、CPU异常、磁盘写入异常、和崩溃
  1. MXHangDiagnostic(App对用户输入长时间无反应),包括以下诊断数据
    Time spent hanging (app 无响应时间)
    Backtrace of main thread(主线程回溯)
  2. MXCPUExceptionDiagnostic(CPU 异常)包括以下诊断数据
  • CPU time consumed(CPU 使用时间)
  • T*otal sampled time(CPU 高利用率期间的总采样时间)
  • Backtrack of threads spining CPU(占用CPU时间的线程回溯)
  1. MXDiskWriteExceptionDiagnostic(磁盘写入异常诊断)包括以下诊断数据
  • Total write caused (导致异常的写入总数)
  • Backtrace of threads causing writes (异常的线程)
    当App突破每天1GB的阀值时,系统就会生成这类诊断
  1. MXCrashDiagnostic ,
  • EXception type,code,and signal
  • Termination reason
  • VM In(for bad access crash)
  • Backtrack 藕粉crashing thread

总而言之,MetricKit诊断,是一个强大的新工具,可以让你在真实的客户用例中,找到回归问题的根本原因,可以帮助我们将优化工作推向新的高度

MetricKit优点

基于苹果原生能力,无额外的性能损耗

MetricKit缺点

只支持iOS13,iOS14以上的系统,
可能对于目前成熟的Crash捕获框架,信息还不够充分

示例数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
// MXMetricPayload
[AnyHashable("applicationResponsivenessMetrics"): {
histogrammedAppHangTime = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 50;
bucketEnd = "100 ms";
bucketStart = "0 ms";
};
1 = {
bucketCount = 60;
bucketEnd = "400 ms";
bucketStart = "100 ms";
};
2 = {
bucketCount = 30;
bucketEnd = "700 ms";
bucketStart = "400 ms";
};
};
};
}, AnyHashable("signpostMetrics"): <__NSArrayM 0x280119170>(
{
signpostCategory = TestSignpostCategory1;
signpostIntervalData = {
histogrammedSignpostDurations = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 50;
bucketEnd = "100 ms";
bucketStart = "0 ms";
};
1 = {
bucketCount = 60;
bucketEnd = "400 ms";
bucketStart = "100 ms";
};
2 = {
bucketCount = 30;
bucketEnd = "700 ms";
bucketStart = "400 ms";
};
};
};
signpostAverageMemory = "100,000 kB";
signpostCumulativeCPUTime = "30,000 ms";
signpostCumulativeHitchTimeRatio = "50 ms per s";
signpostCumulativeLogicalWrites = "600 kB";
};
signpostName = TestSignpostName1;
totalSignpostCount = 30;
},
{
signpostCategory = TestSignpostCategory2;
signpostIntervalData = {
histogrammedSignpostDurations = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 60;
bucketEnd = "200 ms";
bucketStart = "0 ms";
};
1 = {
bucketCount = 70;
bucketEnd = "300 ms";
bucketStart = "201 ms";
};
2 = {
bucketCount = 80;
bucketEnd = "500 ms";
bucketStart = "301 ms";
};
};
};
signpostAverageMemory = "60,000 kB";
signpostCumulativeCPUTime = "50,000 ms";
signpostCumulativeLogicalWrites = "700 kB";
};
signpostName = TestSignpostName2;
totalSignpostCount = 40;
}
)
, AnyHashable("applicationLaunchMetrics"): {
histogrammedResumeTime = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 60;
bucketEnd = "210 ms";
bucketStart = "200 ms";
};
1 = {
bucketCount = 70;
bucketEnd = "310 ms";
bucketStart = "300 ms";
};
2 = {
bucketCount = 80;
bucketEnd = "510 ms";
bucketStart = "500 ms";
};
};
};
histogrammedTimeToFirstDrawKey = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 50;
bucketEnd = "1,010 ms";
bucketStart = "1,000 ms";
};
1 = {
bucketCount = 60;
bucketEnd = "2,010 ms";
bucketStart = "2,000 ms";
};
2 = {
bucketCount = 30;
bucketEnd = "3,010 ms";
bucketStart = "3,000 ms";
};
};
};
}, AnyHashable("cellularConditionMetrics"): {
cellConditionTime = {
histogramNumBuckets = 3;
histogramValue = {
0 = {
bucketCount = 20;
bucketEnd = "1 bars";
bucketStart = "1 bars";
};
1 = {
bucketCount = 30;
bucketEnd = "2 bars";
bucketStart = "2 bars";
};
2 = {
bucketCount = 50;
bucketEnd = "3 bars";
bucketStart = "3 bars";
};
};
};
}, AnyHashable("timeStampEnd"): 2022-09-27 15:59:00 +0000, AnyHashable("metaData"): {
appBuildVersion = 1;
bundleIdentifier = "com.baidu.ALALiveSDKDebug";
deviceType = "iPhone11,2";
osVersion = "iPhone OS 15.1 (19B74)";
platformArchitecture = arm64e;
regionFormat = CN;
}, AnyHashable("displayMetrics"): {
averagePixelLuminance = {
averageValue = "50 apl";
sampleCount = 500;
standardDeviation = 0;
};
}, AnyHashable("locationActivityMetrics"): {
cumulativeBestAccuracyForNavigationTime = "20 sec";
cumulativeBestAccuracyTime = "30 sec";
cumulativeHundredMetersAccuracyTime = "30 sec";
cumulativeKilometerAccuracyTime = "20 sec";
cumulativeNearestTenMetersAccuracyTime = "30 sec";
cumulativeThreeKilometersAccuracyTime = "20 sec";
}, AnyHashable("gpuMetrics"): {
cumulativeGPUTime = "20 sec";
}, AnyHashable("timeStampBegin"): 2022-09-26 16:00:00 +0000, AnyHashable("appVersion"): 1.0, AnyHashable("cpuMetrics"): {
cumulativeCPUInstructions = "100 kiloinstructions";
cumulativeCPUTime = "100 sec";
}, AnyHashable("applicationExitMetrics"): {
backgroundExitData = {
cumulativeAbnormalExitCount = 1;
cumulativeAppWatchdogExitCount = 1;
cumulativeBackgroundFetchCompletionTimeoutExitCount = 1;
cumulativeBackgroundTaskAssertionTimeoutExitCount = 1;
cumulativeBackgroundURLSessionCompletionTimeoutExitCount = 1;
cumulativeBadAccessExitCount = 1;
cumulativeCPUResourceLimitExitCount = 1;
cumulativeIllegalInstructionExitCount = 1;
cumulativeMemoryPressureExitCount = 1;
cumulativeMemoryResourceLimitExitCount = 1;
cumulativeNormalAppExitCount = 1;
cumulativeSuspendedWithLockedFileExitCount = 1;
};
foregroundExitData = {
cumulativeAbnormalExitCount = 1;
cumulativeAppWatchdogExitCount = 1;
cumulativeBadAccessExitCount = 1;
cumulativeCPUResourceLimitExitCount = 1;
cumulativeIllegalInstructionExitCount = 1;
cumulativeMemoryResourceLimitExitCount = 1;
cumulativeNormalAppExitCount = 1;
};
}, AnyHashable("diskIOMetrics"): {
cumulativeLogicalWrites = "1,300 kB";
}, AnyHashable("memoryMetrics"): {
averageSuspendedMemory = {
averageValue = "100,000 kB";
sampleCount = 500;
standardDeviation = 0;
};
peakMemoryUsage = "200,000 kB";
}, AnyHashable("animationMetrics"): {
scrollHitchTimeRatio = "1,000 ms per s";
}, AnyHashable("networkTransferMetrics"): {
cumulativeCellularDownload = "80,000 kB";
cumulativeCellularUpload = "70,000 kB";
cumulativeWifiDownload = "60,000 kB";
cumulativeWifiUpload = "50,000 kB";
}, AnyHashable("applicationTimeMetrics"): {
cumulativeBackgroundAudioTime = "30 sec";
cumulativeBackgroundLocationTime = "30 sec";
cumulativeBackgroundTime = "40 sec";
cumulativeForegroundTime = "700 sec";
}]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
MXDiagnosticPayload
[AnyHashable("hangDiagnostics"): <__NSArrayM 0x280b68720>(
{
callStackTree = {
callStackPerThread = 1;
callStacks = (
{
callStackRootFrames = (
{
address = 74565;
binaryName = testBinaryName;
binaryUUID = "DDD90555-7FB2-4CC5-861B-4E4E35370FD2";
offsetIntoBinaryTextSegment = 123;
sampleCount = 20;
}
);
threadAttributed = 1;
}
);
};
diagnosticMetaData = {
appBuildVersion = 1;
appVersion = "1.0";
bundleIdentifier = "com.baidu.ALALiveSDKDebug";
deviceType = "iPhone11,2";
hangDuration = "20 sec";
osVersion = "iPhone OS 15.1 (19B74)";
platformArchitecture = arm64e;
regionFormat = CN;
};
version = "1.0.0";
}
)
, AnyHashable("cpuExceptionDiagnostics"): <__NSArrayM 0x280b686f0>(
{
callStackTree = {
callStackPerThread = 0;
callStacks = (
{
callStackRootFrames = (
{
address = 74565;
binaryName = testBinaryName;
binaryUUID = "C55495DA-8EC5-473A-8127-435E932B0B4B";
offsetIntoBinaryTextSegment = 123;
sampleCount = 20;
}
);
}
);
};
diagnosticMetaData = {
appBuildVersion = 1;
appVersion = "1.0";
bundleIdentifier = "com.baidu.ALALiveSDKDebug";
deviceType = "iPhone11,2";
osVersion = "iPhone OS 15.1 (19B74)";
platformArchitecture = arm64e;
regionFormat = CN;
totalCPUTime = "20 sec";
totalSampledTime = "20 sec";
};
version = "1.0.0";
}
)
, AnyHashable("crashDiagnostics"): <__NSArrayM 0x280b689c0>(
{
callStackTree = {
callStackPerThread = 1;
callStacks = (
{
callStackRootFrames = (
{
address = 74565;
binaryName = testBinaryName;
binaryUUID = "DDD90555-7FB2-4CC5-861B-4E4E35370FD2";
offsetIntoBinaryTextSegment = 123;
sampleCount = 20;
}
);
threadAttributed = 1;
}
);
};
diagnosticMetaData = {
appBuildVersion = 1;
appVersion = "1.0";
bundleIdentifier = "com.baidu.ALALiveSDKDebug";
deviceType = "iPhone11,2";
exceptionCode = 0;
exceptionType = 1;
osVersion = "iPhone OS 15.1 (19B74)";
platformArchitecture = arm64e;
regionFormat = CN;
signal = 11;
terminationReason = "Namespace SIGNAL, Code 0xb";
virtualMemoryRegionInfo = "0 is not in any region. Bytes before following region: 4000000000 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL UNUSED SPACE AT START ---> __TEXT 0000000000000000-0000000000000000 [ 32K] r-x/r-x SM=COW ...pp/Test";
};
version = "1.0.0";
}
)
, AnyHashable("diskWriteExceptionDiagnostics"): <__NSArrayM 0x280b687b0>(
{
callStackTree = {
callStackPerThread = 0;
callStacks = (
{
callStackRootFrames = (
{
address = 74565;
binaryName = testBinaryName;
binaryUUID = "C55495DA-8EC5-473A-8127-435E932B0B4B";
offsetIntoBinaryTextSegment = 123;
sampleCount = 20;
}
);
}
);
};
diagnosticMetaData = {
appBuildVersion = 1;
appVersion = "1.0";
bundleIdentifier = "com.baidu.ALALiveSDKDebug";
deviceType = "iPhone11,2";
osVersion = "iPhone OS 15.1 (19B74)";
platformArchitecture = arm64e;
regionFormat = CN;
writesCaused = "2,000 byte";
};
version = "1.0.0";
}
)
, AnyHashable("timeStampBegin"): 2022-09-28 01:41:59 +0000, AnyHashable("timeStampEnd"): 2022-09-28 01:41:59 +0000]

暗黑模式一

一、简介

从 iOS 13.0 版本开始,用户可以选择采用系统范围内的浅色或深色外观。 深色外观(称为暗黑模式DarkMode)实现了许多应用程序已经采用的界面样式。 用户可以选择自己喜欢的美学,也可以选择根据环境照明条件或特定时间表来切换其界面。苹果强烈建议支持深色模式。

二、支持暗黑模式

1.在Assets中添加自定义颜色

3.更新自定义视图的具体方法
当用户更改系统外观时,系统会自动要求每个窗口和视图重绘自身。 在此过程中,系统将调用下表中列出的几种众所周知的方法来更新您的内容。 系统在调用这些方法之前会更新特征环境,因此,如果要进行了所有外观敏感的更改,则应用程序会正确的进行更新。

UIView:
-traitCollectionDidChange:
-layoutSubviews
-drawRect:
-updateConstraints
-tintColorDidChange

UIViewController:
-traitCollectionDidChange:
-updateViewConstraints
-viewWillLayoutSubviews
-viewDidLayoutSubviews

UIPresentationController:
-traitCollectionDidChange:
-containerViewWillLayoutSubviews
-containerViewDidLayoutSubviews
当用户在明暗界面之间切换时,系统会要求你的应用重新绘制所有内容。虽然系统管理绘图过程,但在绘图过程中的几个点上,它依赖于您的自定义代码。您的代码必须尽可能快,并且不能执行与外观变化无关的任务。
如果需要根据当前模式的变化来修改界面,可以重写traitCollectionDidChange:方法进行更新

1
2
3
4
5
6
- (void) traitCollectionDidChange:(UITraitCollection *)previousTraitCollection
{
[super traitCollectionDidChange:previousTraitCollection];
UIUserInterfaceStyle userInterfaceStyle = previousTraitCollection.userInterfaceStyle;
// 更新视图
}

然而并不是每次系统调用traitCollectionDidChange:方法时,模式都有变化,也有可能是设备进行了旋转也会调用traitCollectionDidChange:方法,所以此时需要判断系统主题模式是否发生了改变

1
2
3
4
5
6
7
8
- (void) traitCollectionDidChange:(UITraitCollection *)previousTraitCollection
{
[super traitCollectionDidChange:previousTraitCollection];

UITraitCollection *traitCollection = [UITraitCollection currentTraitCollection];// 获取当前的TraitCollection
BOOL hasUserInterfaceStyleChanged = [previousTraitCollection hasDifferentColorAppearanceComparedToTraitCollection:traitCollection];
// 根据当前模式更新视图
}

三、禁止暗黑模式

1、全局整个应用
可以在info.list中加入键值对UIUserInterfaceStyle对应值为Light字符串

1
<key>UIUserInterfaceStyle</key> <string>Light</string>

苹果强烈建议支持暗黑模式。当你致力于改善应用程序的暗模式支持时可以使用UIUserInterfaceStyle键来暂时来退出暗黑模式。
2、具体的UIViewController、UIView、UIWindow可以使用overrideUserInterfaceStyle属性来设置,通过overrideUserInterfaceStyle可以独立、灵活的设置应用、界面和视图的模式。

从iOS 13.0 开始UIViewController、UIView、UIWindow类都新增overrideUserInterfaceStyle属性

1
@property (nonatomic) UIUserInterfaceStyle overrideUserInterfaceStyle API_AVAILABLE(ios(13.0), tvos(13.0)) API_UNAVAILABLE(watchos);

UIUserInterfaceStyle枚举类型

1
2
3
4
5
typedef NS_ENUM(NSInteger, UIUserInterfaceStyle) {
UIUserInterfaceStyleUnspecified,
UIUserInterfaceStyleLight,
UIUserInterfaceStyleDark,
} API_AVAILABLE(tvos(10.0)) API_AVAILABLE(ios(12.0)) API_UNAVAILABLE(watchos);

设置当前窗口的模式:

1
2
3
4
5
6
7
8
9
10
11
- (IBAction)changeModeAction:(id)sender {
UIWindow *keyWindow = [UIApplication sharedApplication].windows.firstObject;
if (keyWindow.overrideUserInterfaceStyle == UIUserInterfaceStyleDark)
{
keyWindow.overrideUserInterfaceStyle = UIUserInterfaceStyleLight;
}
else
{
keyWindow.overrideUserInterfaceStyle = UIUserInterfaceStyleDark;
}
}

设置具体视图控制器的模式:
(void)viewDidLoad {
[super viewDidLoad];
self.overrideUserInterfaceStyle = UIUserInterfaceStyleLight;
}

KSCrash-源码阅读-6 zoombie_class

检测原理

hook 对象的dealloc 方法,将释放的对象保存为僵尸对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#define CREATE_ZOMBIE_HANDLER_INSTALLER(CLASS) \
static IMP g_originalDealloc_ ## CLASS; \
static void handleDealloc_ ## CLASS(id self, SEL _cmd) \
{ \
handleDealloc(self); \
typedef void (*fn)(id,SEL); \
fn f = (fn)g_originalDealloc_ ## CLASS; \
f(self, _cmd); \
} \
static void installDealloc_ ## CLASS() \
{ \
Method method = class_getInstanceMethod(objc_getClass(#CLASS), sel_registerName("dealloc")); \
g_originalDealloc_ ## CLASS = method_getImplementation(method); \
method_setImplementation(method, (IMP)handleDealloc_ ## CLASS); \
}
// TODO: Uninstall doesn't work.
//static void uninstallDealloc_ ## CLASS() \
//{ \
// method_setImplementation(class_getInstanceMethod(objc_getClass(#CLASS), sel_registerName("dealloc")), g_originalDealloc_ ## CLASS); \
//}

CREATE_ZOMBIE_HANDLER_INSTALLER(NSObject)
CREATE_ZOMBIE_HANDLER_INSTALLER(NSProxy)

static void install()
{
unsigned cacheSize = CACHE_SIZE;
g_zombieHashMask = cacheSize - 1;
g_zombieCache = calloc(cacheSize, sizeof(*g_zombieCache));
if(g_zombieCache == NULL)
{
KSLOG_ERROR("Error: Could not allocate %u bytes of memory. KSZombie NOT installed!",
cacheSize * sizeof(*g_zombieCache));
return;
}

g_lastDeallocedException.class = objc_getClass("NSException");
g_lastDeallocedException.address = NULL;
g_lastDeallocedException.name[0] = 0;
g_lastDeallocedException.reason[0] = 0;

installDealloc_NSObject();
installDealloc_NSProxy();
}

static inline void handleDealloc(const void* self)
{
volatile Zombie* cache = g_zombieCache;
likely_if(cache != NULL)
{
Zombie* zombie = (Zombie*)cache + hashIndex(self);
zombie->object = self;
Class class = object_getClass((id)self);
zombie->className = class_getName(class);
for(; class != nil; class = class_getSuperclass(class))
{
unlikely_if(class == g_lastDeallocedException.class)
{
storeException(self);
}
}
}
}

参考资料

iOS使用Zombie Objects检测僵尸对象及其原理

KSCrash-源码阅读-5 Dead lock

检测原理

主线程死锁的检测和 ANR 的检测有些类似

创建一个线程,在线程运行方法中用 do…while… 循环处理逻辑,加了 autorelease 避免内存过高

有一个 awaitingResponse 属性和 watchdogPulse 方法。watchdogPulse 主要逻辑为设置 awaitingResponse 为 YES,切换到主线程中,设置 awaitingResponse 为 NO,

线程的执行方法里面不断循环,等待设置的 g_watchdogInterval 后判断 awaitingResponse 的属性值是不是初始状态的值,否则判断为死锁

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
- (id) init
{
if((self = [super init]))
{
// target (self) is retained until selector (runMonitor) exits.
self.monitorThread = [[NSThread alloc] initWithTarget:self selector:@selector(runMonitor) object:nil];
self.monitorThread.name = @"KSCrash Deadlock Detection Thread";
[self.monitorThread start];
}
return self;
}

- (void) runMonitor
{
BOOL cancelled = NO;
do
{
// Only do a watchdog check if the watchdog interval is > 0.
// If the interval is <= 0, just idle until the user changes it.
@autoreleasepool {
NSTimeInterval sleepInterval = g_watchdogInterval;
BOOL runWatchdogCheck = sleepInterval > 0;
if(!runWatchdogCheck)
{
sleepInterval = kIdleInterval;
}
[NSThread sleepForTimeInterval:sleepInterval];
cancelled = self.monitorThread.isCancelled;
if(!cancelled && runWatchdogCheck)
{
if(self.awaitingResponse)
{
[self handleDeadlock];
}
else
{
[self watchdogPulse];
}
}
}
} while (!cancelled);
}
- (void) watchdogPulse
{
__block id blockSelf = self;
self.awaitingResponse = YES;
dispatch_async(dispatch_get_main_queue(), ^
{
[blockSelf watchdogAnswer];
});
}

处理异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
- (void) handleDeadlock
{
thread_act_array_t threads = NULL;
mach_msg_type_number_t numThreads = 0;
ksmc_suspendEnvironment(&threads, &numThreads);
kscm_notifyFatalExceptionCaptured(false);

KSMC_NEW_CONTEXT(machineContext);
ksmc_getContextForThread(g_mainQueueThread, machineContext, false);
KSStackCursor stackCursor;
kssc_initWithMachineContext(&stackCursor, KSSC_MAX_STACK_DEPTH, machineContext);
char eventID[37];
ksid_generate(eventID);

KSLOG_DEBUG(@"Filling out context.");
KSCrash_MonitorContext* crashContext = &g_monitorContext;
memset(crashContext, 0, sizeof(*crashContext));
crashContext->crashType = KSCrashMonitorTypeMainThreadDeadlock;
crashContext->eventID = eventID;
crashContext->registersAreValid = false;
crashContext->offendingMachineContext = machineContext;
crashContext->stackCursor = &stackCursor;

kscm_handleException(crashContext);
ksmc_resumeEnvironment(threads, numThreads);

KSLOG_DEBUG(@"Calling abort()");
abort();
}

KSCrash-源码阅读-5 C++ Exception

异常监听

c++ 异常处理的实现是依靠了标准库的 std::set_terminate(CPPExceptionTerminate) 函数。

iOS 工程中某些功能的实现可能使用了C、C++等。假如抛出 C++ 异常,如果该异常可以被转换为 NSException,则走 OC 异常捕获机制,如果不能转换,则继续走 C++ 异常流程,也就是 default_terminate_handler。这个 C++ 异常的默认 terminate 函数内部调用 abort_message 函数,最后触发了一个 abort 调用,系统产生一个 SIGABRT 信号。

在系统抛出 C++ 异常后,加一层 try…catch… 来判断该异常是否可以转换为 NSException,再重新抛出的C++异常。此时异常的现场堆栈已经消失,所以上层通过捕获 SIGABRT 信号是无法还原发生异常时的场景,即异常堆栈缺失。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
static void setEnabled(bool isEnabled)
{
if(isEnabled != g_isEnabled)
{
g_isEnabled = isEnabled;
if(isEnabled)
{
initialize();

ksid_generate(g_eventID);
g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate);
}
else
{
std::set_terminate(g_originalTerminateHandler);
}
g_captureNextStackTrace = isEnabled;
}
}
static void initialize()
{
static bool isInitialized = false;
if(!isInitialized)
{
isInitialized = true;
kssc_initCursor(&g_stackCursor, NULL, NULL);
}
}
void kssc_initCursor(KSStackCursor *cursor,
void (*resetCursor)(KSStackCursor*),
bool (*advanceCursor)(KSStackCursor*))
{
cursor->symbolicate = kssymbolicator_symbolicate;
cursor->advanceCursor = advanceCursor != NULL ? advanceCursor : g_advanceCursor;
cursor->resetCursor = resetCursor != NULL ? resetCursor : kssc_resetCursor;
cursor->resetCursor(cursor);
}

处理异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
static void CPPExceptionTerminate(void)
{
thread_act_array_t threads = NULL;
mach_msg_type_number_t numThreads = 0;
ksmc_suspendEnvironment(&threads, &numThreads);
KSLOG_DEBUG("Trapped c++ exception");
const char* name = NULL;
std::type_info* tinfo = __cxxabiv1::__cxa_current_exception_type();
if(tinfo != NULL)
{
name = tinfo->name();
}

if(name == NULL || strcmp(name, "NSException") != 0)
{
kscm_notifyFatalExceptionCaptured(false);
KSCrash_MonitorContext* crashContext = &g_monitorContext;
memset(crashContext, 0, sizeof(*crashContext));

char descriptionBuff[DESCRIPTION_BUFFER_LENGTH];
const char* description = descriptionBuff;
descriptionBuff[0] = 0;

KSLOG_DEBUG("Discovering what kind of exception was thrown.");
g_captureNextStackTrace = false;
try
{
throw;
}
catch(std::exception& exc)
{
strncpy(descriptionBuff, exc.what(), sizeof(descriptionBuff));
}
#define CATCH_VALUE(TYPE, PRINTFTYPE) \
catch(TYPE value)\
{ \
snprintf(descriptionBuff, sizeof(descriptionBuff), "%" #PRINTFTYPE, value); \
}
CATCH_VALUE(char, d)
CATCH_VALUE(short, d)
CATCH_VALUE(int, d)
CATCH_VALUE(long, ld)
CATCH_VALUE(long long, lld)
CATCH_VALUE(unsigned char, u)
CATCH_VALUE(unsigned short, u)
CATCH_VALUE(unsigned int, u)
CATCH_VALUE(unsigned long, lu)
CATCH_VALUE(unsigned long long, llu)
CATCH_VALUE(float, f)
CATCH_VALUE(double, f)
CATCH_VALUE(long double, Lf)
CATCH_VALUE(char*, s)
catch(...)
{
description = NULL;
}
g_captureNextStackTrace = g_isEnabled;

// TODO: Should this be done here? Maybe better in the exception handler?
KSMC_NEW_CONTEXT(machineContext);
ksmc_getContextForThread(ksthread_self(), machineContext, true);

KSLOG_DEBUG("Filling out context.");
crashContext->crashType = KSCrashMonitorTypeCPPException;
crashContext->eventID = g_eventID;
crashContext->registersAreValid = false;
crashContext->stackCursor = &g_stackCursor;
crashContext->CPPException.name = name;
crashContext->exceptionName = name;
crashContext->crashReason = description;
crashContext->offendingMachineContext = machineContext;

kscm_handleException(crashContext);
}
else
{
KSLOG_DEBUG("Detected NSException. Letting the current NSException handler deal with it.");
}
ksmc_resumeEnvironment(threads, numThreads);

KSLOG_DEBUG("Calling original terminate handler.");
g_originalTerminateHandler();
}

KSCrash-源码阅读-4 NSEexption

添加NsException 自定义处理入口

  1. NSGetUncaughtExceptionHandler() 获取已经添加的处理入口
  2. NSSetUncaughtExceptionHandler(&handleUncaughtException) 添加自己的异常处理入口
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    static void setEnabled(bool isEnabled)
    {
    if(isEnabled != g_isEnabled)
    {
    g_isEnabled = isEnabled;
    if(isEnabled)
    {
    KSLOG_DEBUG(@"Backing up original handler.");
    g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler();

    KSLOG_DEBUG(@"Setting new handler.");
    NSSetUncaughtExceptionHandler(&handleUncaughtException);
    KSCrash.sharedInstance.uncaughtExceptionHandler = &handleUncaughtException;
    KSCrash.sharedInstance.currentSnapshotUserReportedExceptionHandler = &handleCurrentSnapshotUserReportedException;
    }
    else
    {
    KSLOG_DEBUG(@"Restoring original handler.");
    NSSetUncaughtExceptionHandler(g_previousUncaughtExceptionHandler);
    }
    }
    }

    异常处理入口

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    // ============================================================================

    /** Our custom excepetion handler.
    * Fetch the stack trace from the exception and write a report.
    *
    * @param exception The exception that was raised.
    */

    static void handleException(NSException* exception, BOOL currentSnapshotUserReported) {
    KSLOG_DEBUG(@"Trapped exception %@", exception);
    if(g_isEnabled)
    {
    thread_act_array_t threads = NULL;
    mach_msg_type_number_t numThreads = 0;
    ksmc_suspendEnvironment(&threads, &numThreads);
    kscm_notifyFatalExceptionCaptured(false);

    KSLOG_DEBUG(@"Filling out context.");
    NSArray* addresses = [exception callStackReturnAddresses];
    NSUInteger numFrames = addresses.count;
    uintptr_t* callstack = malloc(numFrames * sizeof(*callstack));
    for(NSUInteger i = 0; i < numFrames; i++)
    {
    callstack[i] = (uintptr_t)[addresses[i] unsignedLongLongValue];
    }

    char eventID[37];
    ksid_generate(eventID);
    KSMC_NEW_CONTEXT(machineContext);
    ksmc_getContextForThread(ksthread_self(), machineContext, true);
    KSStackCursor cursor;
    kssc_initWithBacktrace(&cursor, callstack, (int)numFrames, 0);

    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    memset(crashContext, 0, sizeof(*crashContext));
    crashContext->crashType = KSCrashMonitorTypeNSException;
    crashContext->eventID = eventID;
    crashContext->offendingMachineContext = machineContext;
    crashContext->registersAreValid = false;
    crashContext->NSException.name = [[exception name] UTF8String];
    crashContext->NSException.userInfo = [[NSString stringWithFormat:@"%@", exception.userInfo] UTF8String];
    crashContext->exceptionName = crashContext->NSException.name;
    crashContext->crashReason = [[exception reason] UTF8String];
    crashContext->stackCursor = &cursor;
    crashContext->currentSnapshotUserReported = currentSnapshotUserReported;

    KSLOG_DEBUG(@"Calling main crash handler.");
    kscm_handleException(crashContext);

    free(callstack);
    if (currentSnapshotUserReported) {
    ksmc_resumeEnvironment(threads, numThreads);
    }
    if (g_previousUncaughtExceptionHandler != NULL)
    {
    KSLOG_DEBUG(@"Calling original exception handler.");
    g_previousUncaughtExceptionHandler(exception);
    }
    }
    }

KSCrash-源码阅读-3 Signal

Signal异常处理流程

  1. Restore the default signal handlers
  2. 添加signal handler 处理异常
  3. record the signal information,write a crash report
  4. Once we’re done, re-raise the signal and let the default handlers deal with it

    添加 signal handler

  5. signal异常
    1
    2
    3
    4
    5
    6
    7
    8
    9
    定义在 #include <machine/signal.h> 
    SIGABRT,/* abort() */
    SIGBUS,/* bus error */
    SIGFPE,/* floating point exception */
    SIGILL,/* illegal instruction (not reset when caught) */
    SIGPIPE, /* write on a pipe with no one to read it */
    SIGSEGV, /* segmentation violation */
    SIGSYS,/* bad argument to system call */
    SIGTRAP /* trace trap (not reset when caught) */
  6. 添加handler

关键函数

/**
一、函数原型:sigaction函数的功能是检查或修改与指定信号相关联的> 处理动作(可同时> 两种操作)

int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);
signum参数指出要捕获的信号类型,act参数指定新的信号处理方式,> > oldact参数输出> 先前信号的处理方式
*/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
static bool installSignalHandler()
{
KSLOG_DEBUG("Installing signal handler.");

#if KSCRASH_HAS_SIGNAL_STACK

if(g_signalStack.ss_size == 0)
{
KSLOG_DEBUG("Allocating signal stack area.");
g_signalStack.ss_size = SIGSTKSZ;
g_signalStack.ss_sp = malloc(g_signalStack.ss_size);
}

KSLOG_DEBUG("Setting signal stack area.");
/**
信号处理函数的栈挪到堆中,而不和进程共用一块栈区
sigaltstack() 函数,该函数的第 1 个参数 sigstack
是一个 stack_t 结构的指针,该结构存储了一个“可替换信号栈”
的位置及属性信息。第 2 个参数 old_sigstack
也是一个 stack_t 类型指针,
它用来返回上一次建立的“可替换信号栈”的信息(如果有的话)
*/
if(sigaltstack(&g_signalStack, NULL) != 0)
{
KSLOG_ERROR("signalstack: %s", strerror(errno));
goto failed;
}
#endif

const int* fatalSignals = kssignal_fatalSignals();
int fatalSignalsCount = kssignal_numFatalSignals();

if(g_previousSignalHandlers == NULL)
{
KSLOG_DEBUG("Allocating memory to store previous signal handlers.");
g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers)
* (unsigned)fatalSignalsCount);
}

struct sigaction action = {{0}};
action.sa_flags = SA_SIGINFO | SA_ONSTACK;
#if KSCRASH_HOST_APPLE && defined(__LP64__)
action.sa_flags |= SA_64REGSET;
#endif
sigemptyset(&action.sa_mask);
action.sa_sigaction = &handleSignal;

for(int i = 0; i < fatalSignalsCount; i++)
{
KSLOG_DEBUG("Assigning handler for signal %d", fatalSignals[i]);
if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) != 0)
{
char sigNameBuff[30];
const char* sigName = kssignal_signalName(fatalSignals[i]);
if(sigName == NULL)
{
snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]);
sigName = sigNameBuff;
}
KSLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno));
// Try to reverse the damage
for(i--;i >= 0; i--)
{
/**
一、函数原型:sigaction函数的功能是检查或修改与指定信号相关联的处理动作(可同时两种操作)

int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);
signum参数指出要捕获的信号类型,act参数指定新的信号处理方式,oldact参数输出先前信号的处理方式
*/
sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
}
goto failed;
}
}
KSLOG_DEBUG("Signal handlers installed.");
return true;

failed:
KSLOG_DEBUG("Failed to install signal handlers.");
return false;
}

处理异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/** Our custom signal handler.
* Restore the default signal handlers, record the signal information, and
* write a crash report.
* Once we're done, re-raise the signal and let the default handlers deal with
* it.
*
* @param sigNum The signal that was raised.
*
* @param signalInfo Information about the signal.
*
* @param userContext Other contextual information.
*/
static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext)
{
KSLOG_DEBUG("Trapped signal %d", sigNum);
if(g_isEnabled)
{
thread_act_array_t threads = NULL;
mach_msg_type_number_t numThreads = 0;
ksmc_suspendEnvironment(&threads, &numThreads);
kscm_notifyFatalExceptionCaptured(false);

KSLOG_DEBUG("Filling out context.");
KSMC_NEW_CONTEXT(machineContext);
ksmc_getContextForSignal(userContext, machineContext);
kssc_initWithMachineContext(&g_stackCursor, KSSC_MAX_STACK_DEPTH, machineContext);

KSCrash_MonitorContext* crashContext = &g_monitorContext;
memset(crashContext, 0, sizeof(*crashContext));
crashContext->crashType = KSCrashMonitorTypeSignal;
crashContext->eventID = g_eventID;
crashContext->offendingMachineContext = machineContext;
crashContext->registersAreValid = true;
crashContext->faultAddress = (uintptr_t)signalInfo->si_addr;
crashContext->signal.userContext = userContext;
crashContext->signal.signum = signalInfo->si_signo;
crashContext->signal.sigcode = signalInfo->si_code;
crashContext->stackCursor = &g_stackCursor;

kscm_handleException(crashContext);
ksmc_resumeEnvironment(threads, numThreads);
}

KSLOG_DEBUG("Re-raising signal for regular handlers to catch.");
// This is technically not allowed, but it works in OSX and iOS.
raise(sigNum);
}

调试signal

Xcode Debug模式运行App时,App进程signal被LLDB Debugger调试器捕获;需要使用LLDB调试命令,将指定signal处理抛到用户层处理,方便调试。

查看全部信号传递配置:

// process handle缩写

pro hand

修改指定信号传递配置:

// option:

// -P: PASS

// -S: STOP

// -N: NOTIFY

pro hand -option false 信号名

// 例:SIGABRT信号处理在LLDB不停止,可继续抛到用户层

pro hand -s false SIGABRT

卡顿说明

参考

第一部分:FrameTime

FrameTime 的定义:两帧画面间隔耗时(也可简单认为单帧渲染耗时)。

对于FrameTime和卡顿是否有关联?请看下图的案例图示:
{% asset_img 8f45de6b-60de-4021-80a3-be3d2cbd3ea9.png This is an example image %}

从图中可看出画面中B帧在GPU渲染耗时(帧生成时间)大于显示器刷新间隔,占用两次显示器刷新耗时。也就是说有一次画面没刷新。当出现多次没有画面刷新(也就是说画面没变化),则可能是一次卡顿。

从这里就得出结论:玩家用户真正看到的是屏幕新画面刷新间隔时间,而不是eglSwapbuffers-GPU渲染完成(并未有提交屏幕显示)间隔时间。所以后面所提到Frametime统统指的是屏幕Display-Frametime。

PerfDog工具优点:PerfDog统计的FPS和Frametime都是用户看到的屏幕Display新画面真实刷新FPS和帧耗时。所以大家可以直接通过Frametime来判断测试过程中是否出现卡顿

第二部分:FPS

FPS的定义:帧率(1秒内平均画面刷新次数)。

平均帧率:传统常说的FPS,1秒内平均画面刷新次数。

瞬时帧率:单帧耗时FrameTime算出来实时FPS,每一帧画面刷新耗时换算出的实时帧率。



画面渲染流程图如下,每一帧FrameTime。
 {% asset_img 24bfdf7a-4db8-4d64-a324-3731a28d5ee1.png This is an example image %}

PerfDog统计帧率及FrameTime如下图:

iOS端
苹果WDDC18年开发者大会

①     FramePacing

比如下面两个游戏画面,左边的试图以60帧运行,但实际只能达到40帧;右边的则持续稳定在30帧运行:
 {% asset_img 640.gif This is an example image %}

通过FrameTime可以看出,左边高帧率FPS=40帧率中出现一次FrameTim>=117ms,理论平均FrameTime=25ms。所以非均匀渲染,虽然帧率高达40,但依然觉得非常卡。

总结:帧率高,未必流畅

第三部分:流畅度

流畅度与卡顿的关联可以用以下的流程图来大致展示:
{% asset_img 76c17c01-f03d-488a-8929-b399351f51d8.png This is an example image %}
流畅度影响卡顿。这个可以简单的理解为视觉惯性和电影帧这两个方面

 1、视觉惯性
    视觉预期帧率,用户潜意识里认为下帧也应该是当前帧率刷新比如一直60帧,用户潜意识里认为下帧也应该是60帧率。刷新一直是25帧,用户潜意识里认为下帧也应该是25帧率。但是刷新如果是60帧一下跳变为25帧,扰乱用户视觉惯性。这个时候就会出现用户体验的卡顿感。



2、电影帧
    电影帧率(18-24),一般是24帧。电影帧单帧耗时:1000ms/24≈41.67ms。电影帧率是一个临界点。低于这个帧率,人眼基本能感觉画面不连续性,也就是感觉到了卡顿。

第四部分:

  • Color Blended Layers - 这个选项基于渲染程度对屏幕中的混合区域进行绿到 红的高亮(也就是多个半透明图层的叠加)。由于重绘的原因,混合对GPU性 能会有影响,同时也是滑动或者动画帧率下降的罪魁祸首之一。
  • Color Hits Green and Misses Red - 当使用 shouldRasterizep 属性的时候, 耗时的图层绘制会被缓存,然后当做一个简单的扁平图片呈现。当缓存再生的时候这个选项就用红色对栅格化图层进行了高亮。如果缓存频繁再生的话,就意味着栅格化可能会有负面的性能影响了。
  • Color Copied Images - 有时候寄宿图片的生成意味着Core Animation被强制生成一些图片,然后发送到渲染服务器,而不是简单的指向原始指针。这个选 项把这些图片渲染成蓝色。复制图片对内存和CPU使用来说都是一项非常昂贵的操作,所以应该尽可能的避免。
  • Color Immediately - 通常Core Animation Instruments以每毫秒10次的频率更 新图层调试颜色。对某些效果来说,这显然太慢了。这个选项就可以用来设置每帧都更新(可能会影响到渲染性能,而且会导致帧率测量不准,所以不要一直都设置它)。
  • Color Misaligned Images - 这里会高亮那些被缩放或者拉伸以及没有正确对齐到像素边界的图片(也就是非整型坐标)。这些中的大多数通常都会导致图片的不正常缩放,如果把一张大图当缩略图显示,或者不正确地模糊图像,那么这个选项将会帮你识别出问题所在。
  • Color Offscreen-Rendered Yellow - 这里会把那些需要离屏渲染的图层高亮 成黄色。这些图层很可能需要用 shadowPath 或者 shouldRasterize 来优化。
  • Color OpenGL Fast Path Blue - 这个选项会对任何直接使用OpenGL绘制的 图层进行高亮。如果仅仅使用UIKit或者Core Animation的API,那么不会有任何效果。如果使用 GLKView 或者 CAEAGLLayer,那如果不显示蓝色块的话就意味着你正在强制CPU渲染额外的纹理,而不是绘制到屏幕。
  • Flash Updated Regions - 这个选项会对重绘的内容高亮成黄色(也就是任何在软件层面使用Core Graphics绘制的图层)。这种绘图的速度很慢。如果频繁发生这种情况的话,这意味着有一个隐藏的bug或者说通过增加缓存或者使 用替代方案会有提升性能的空间。

KSCrash 源码阅读(1) Crash分类

1. 异常的类型

  • Mach kernel exception:是指最底层的内核级异常。用户态?的开发者可以直接通过Mach API设置thread,task,host?的异常端口,来捕获Mach异常

  • Fatal signal:又称 BSD? 信号,如果开发者没有捕获Mach异常,则会被host层的方法ux_exception()将异常转换为对应的UNIX信号,并通过方法threadsignal()将信号投递到出错线程。可以通过方法signal(x, SignalHandler)来捕获signal

  • Uncaught C++ exception : Captures and reports C++ exceptions

  • Uncaught Objective-C NSException :应用级异常,它是未被捕获的Objective-C异常,导致程序向自身发送了SIGABRT信号而崩溃,是app自己可控的,对于未捕获的Objective-C异常,是可以通过try catch来捕获的,或者通过NSSetUncaughtExceptionHandler()机制来捕获

  • Deadlock on the main thread:

  • User reported custom exception :

    2. Mach异常与 Unix 信号

    • Mach是一个微内核,旨在提供基本的进程间通信功能。
    • XNU是一个混合内核,由Mach微内核和更传统(“单片”)
    • BSD unix内核的组件组成。它还包括在运行时加载内核扩展的功能(添加功能,设备驱动程序等)
    • Darwin是一个Unix操作系统,由XNU内核以及各种开源实用程序,库等组成。
      OS X是Darwin,加上许多专有组件,最着名的是它的图形界面API。
    1. Mach微内核中有几个基础概念:
  • Tasks,拥有一组系统资源的对象,允许”thread”在其中执行。
  • Threads,执行的基本单位,拥有task的上下文,并共享其资源。
  • Ports,task之间通讯的一组受保护的消息队列;task可对任何port发送/接收数据。
  • Message,有类型的数据对象集合,只可以发送到port
  1. signal:
    1) SIGHUP
    本信号在用户终端连接(正常或非正常)结束时发出, 通常是在终端的控制进程结束时, 通知同一session内的各个作业, 这时它们与控制终端不再关联。
    登录Linux时,系统会分配给登录用户一个终端(Session)。在这个终端运行的所有程序,包括前台进程组和后台进程组,一般都属于这个 Session。当用户退出Linux登录时,前台进程组和后台有对终端输出的进程将会收到SIGHUP信号。这个信号的默认操作为终止进程,因此前台进 程组和后台有终端输出的进程就会中止。不过可以捕获这个信号,比如wget能捕获SIGHUP信号,并忽略它,这样就算退出了Linux登录, wget也 能继续下载。
    此外,对于与终端脱离关系的守护进程,这个信号用于通知它重新读取配置文件。
    2) SIGINT
    程序终止(interrupt)信号, 在用户键入INTR字符(通常是Ctrl-C)时发出,用于通知前台进程组终止进程。
    3) SIGQUIT
    和SIGINT类似, 但由QUIT字符(通常是Ctrl-)来控制. 进程在因收到SIGQUIT退出时会产生core文件, 在这个意义上类似于一个程序错误信号。
    6) SIGABRT
    调用abort函数生成的信号。
    7) SIGBUS
    非法地址, 包括内存地址对齐(alignment)出错。比如访问一个四个字长的整数, 但其地址不是4的倍数。它与SIGSEGV的区别在于后者是由于对合法存储地址的非法访问触发的(如访问不属于自己存储空间或只读存储空间)。
    8) SIGFPE
    在发生致命的算术运算错误时发出. 不仅包括浮点运算错误, 还包括溢出及除数为0等其它所有的算术的错误。
    9) SIGKILL
    用来立即结束程序的运行. 本信号不能被阻塞、处理和忽略。如果管理员发现某个进程终止不了,可尝试发送这个信号。
    11) SIGSEGV
    试图访问未分配给自己的内存, 或试图往没有写权限的内存地址写数据.
    13) SIGPIPE
    管道破裂。这个信号通常在进程间通信产生,比如采用FIFO(管道)通信的两个进程,读管道没打开或者意外终止就往管道写,写进程会收到SIGPIPE信号。此外用Socket通信的两个进程,写进程在写Socket的时候,读进程已经终止。
    ios crash 中主要是SIGKILL,SIGSEGV,SIGABRT,SIGTRAP
    引起系统信号crash 主要有内存泄露、野指针等.
    几种特别类型:
    0x8badf00d 在启动、终⽌止应⽤用或响应系统事件花费过⻓长时间,意为“ate bad food”。
    0xdeadfa11 ⽤用户强制退出,意为“dead fall”。(系统⽆无响应时,⽤用户按电源开关和HOME)
    0xbaaaaaad ⽤用户按住Home键和⾳音量键,获取当前内存状态,不代表崩溃
    0xbad22222 VoIP应⽤用因为恢复得太频繁导致crash
    0xc00010ff 因为太烫了被干掉,意为“cool off”
    0xdead10cc 因为在后台时仍然占据系统资源(⽐比如通讯录)被干掉,意为“dead lock”

    3. Objective-C Exception

    比如我们经常遇到的数组越界,数组插入 nil,都是属于此种类型,主要包含以下几类:

  • NSInvalidArgumentException
    非法参数异常

  • NSRangeException
    数组越界异常

  • NSGenericException
    这个异常最容易出现在foreach操作中,在for in循环中如果修改所遍历的数组,无论你是add或remove,都会出错

  • NSInternalInconsistencyException
    不一致导致出现的异常
    比如NSDictionary当做NSMutableDictionary来使用,从他们内部的机理来说,就会产生一些错误

  • NSFileHandleOperationException
    处理文件时的一些异常,最常见的还是存储空间不足的问题

  • NSMallocException
    这也是内存不足的问题,无法分配足够的内存空间

KSCrash 源码阅读-2 Mach kerner exception

mach crash 处理流程

h5 load time

1. 注册mach异常处理回调

  1. 保存原先的端口,用来处理完后回调给原来的端口
  2. mach_port_allocate 已receive rights新建一个端口
  3. 为新建端口添加发送消息权限
  4. task_set_exception_ports 将新建端口作为异常接收端口
  5. pthread_create创建一个备用线程
  6. 将备用线程将pthread_t的类型转换成mach_port_t类型
  7. pthread_create 创建初始处理异常的线程
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    static bool installExceptionHandler()
    {
    KSLOG_DEBUG("Installing mach exception handler.");

    bool attributes_created = false;
    pthread_attr_t attr;

    kern_return_t kr;
    int error;
    /**
    任务(task)是一种容器(container)对象,
    虚拟内存空间和其他资源都是通过这个容器对象管理的,
    这些资源包括设备和其他句柄。资源进一步被抽象为端口。
    因而资源的共享实际上相当于允许对对应端口的访问

    链接:https://www.jianshu.com/p/cc655bfdac13
    */
    const task_t thisTask = mach_task_self(); //获得任务的端口,带有发送权限的名称
    exception_mask_t mask = EXC_MASK_BAD_ACCESS |
    EXC_MASK_BAD_INSTRUCTION |
    EXC_MASK_ARITHMETIC |
    EXC_MASK_SOFTWARE |
    EXC_MASK_BREAKPOINT;
    // 保存原先的端口,用来处理完后回调给原来的端口
    KSLOG_DEBUG("Backing up original exception ports.");
    kr = task_get_exception_ports(thisTask,
    mask,
    g_previousExceptionPorts.masks,
    &g_previousExceptionPorts.count,
    g_previousExceptionPorts.ports,
    g_previousExceptionPorts.behaviors,
    g_previousExceptionPorts.flavors);
    if(kr != KERN_SUCCESS)
    {
    KSLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr));
    goto failed;
    }

    if(g_exceptionPort == MACH_PORT_NULL)
    {
    KSLOG_DEBUG("Allocating new port with receive rights.");
    kr = mach_port_allocate(thisTask,
    MACH_PORT_RIGHT_RECEIVE,
    &g_exceptionPort);
    if(kr != KERN_SUCCESS)
    {
    KSLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr));
    goto failed;
    }

    KSLOG_DEBUG("Adding send rights to port.");
    kr = mach_port_insert_right(thisTask,
    g_exceptionPort,
    g_exceptionPort,
    MACH_MSG_TYPE_MAKE_SEND);
    if(kr != KERN_SUCCESS)
    {
    KSLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr));
    goto failed;
    }
    }

    KSLOG_DEBUG("Installing port as exception handler.");
    kr = task_set_exception_ports(thisTask,
    mask,
    g_exceptionPort,
    (int)(EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES),
    THREAD_STATE_NONE);
    if(kr != KERN_SUCCESS)
    {
    KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr));
    goto failed;
    }

    KSLOG_DEBUG("Creating secondary exception thread (suspended).");
    pthread_attr_init(&attr);
    attributes_created = true;
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    error = pthread_create(&g_secondaryPThread,
    &attr,
    &handleExceptions,
    (void*)kThreadSecondary);
    if(error != 0)
    {
    KSLOG_ERROR("pthread_create_suspended_np: %s", strerror(error));
    goto failed;
    }
    g_secondaryMachThread = pthread_mach_thread_np(g_secondaryPThread);
    ksmc_addReservedThread(g_secondaryMachThread);

    KSLOG_DEBUG("Creating primary exception thread.");
    error = pthread_create(&g_primaryPThread,
    &attr,
    &handleExceptions,
    (void*)kThreadPrimary);
    if(error != 0)
    {
    KSLOG_ERROR("pthread_create: %s", strerror(error));
    goto failed;
    }
    pthread_attr_destroy(&attr);
    g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread);
    ksmc_addReservedThread(g_primaryMachThread);

    KSLOG_DEBUG("Mach exception handler installed.");
    return true;


    failed:
    KSLOG_DEBUG("Failed to install mach exception handler.");
    if(attributes_created)
    {
    pthread_attr_destroy(&attr);
    }
    uninstallExceptionHandler();
    return false;
    }

    2. 处理异常

  8. 在for(;;)循环中等待mach消息,处理异常的线程不会一直执行 mach_msg会等待消息
  9. 抓取异常
  10. ksmc_suspendEnvironment 挂起所有线程除异常处理线程之外
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    /** Our exception handler thread routine.
    * Wait for an exception message, uninstall our exception port, record the
    * exception information, and write a report.
    */
    / ============================================================================

    /** A mach exception message (according to ux_exception.c, xnu-1699.22.81).
    */
    #pragma pack(4)
    typedef struct
    {
    /** Mach header. */
    mach_msg_header_t header;

    // Start of the kernel processed data.

    /** Basic message body data. */
    mach_msg_body_t body;

    /** The thread that raised the exception. */
    mach_msg_port_descriptor_t thread;

    /** The task that raised the exception. */
    mach_msg_port_descriptor_t task;

    // End of the kernel processed data.

    /** Network Data Representation. */
    NDR_record_t NDR;

    /** The exception that was raised. */
    exception_type_t exception;

    /** The number of codes. */
    mach_msg_type_number_t codeCount;

    /** Exception code and subcode. */
    // ux_exception.c defines this as mach_exception_data_t for some reason.
    // But it's not actually a pointer; it's an embedded array.
    // On 32-bit systems, only the lower 32 bits of the code and subcode
    // are valid.
    mach_exception_data_type_t code[0];

    /** Padding to avoid RCV_TOO_LARGE. */
    char padding[512];
    } MachExceptionMessage;
    // 处理异常
    static void* handleExceptions(void* const userData)
    {
    MachExceptionMessage exceptionMessage = {{0}};
    MachReplyMessage replyMessage = {{0}};
    char* eventID = g_primaryEventID;

    const char* threadName = (const char*) userData;
    pthread_setname_np(threadName);
    if(threadName == kThreadSecondary)
    {
    KSLOG_DEBUG("This is the secondary thread. Suspending.");
    thread_suspend((thread_t)ksthread_self());
    eventID = g_secondaryEventID;
    }

    for(;;)
    {
    KSLOG_DEBUG("Waiting for mach exception");

    // Wait for a message.
    kern_return_t kr = mach_msg(&exceptionMessage.header,
    MACH_RCV_MSG,
    0,
    sizeof(exceptionMessage),
    g_exceptionPort,
    MACH_MSG_TIMEOUT_NONE,
    MACH_PORT_NULL);
    if(kr == KERN_SUCCESS)
    {
    break;
    }

    // Loop and try again on failure.
    KSLOG_ERROR("mach_msg: %s", mach_error_string(kr));
    }

    KSLOG_DEBUG("Trapped mach exception code 0x%llx, subcode 0x%llx",
    exceptionMessage.code[0], exceptionMessage.code[1]);
    if(g_isEnabled)
    {
    thread_act_array_t threads = NULL;
    mach_msg_type_number_t numThreads = 0;
    ///挂起所有的线程
    ksmc_suspendEnvironment(&threads, &numThreads);
    g_isHandlingCrash = true;
    kscm_notifyFatalExceptionCaptured(true);

    KSLOG_DEBUG("Exception handler is installed. Continuing exception handling.");


    // Switch to the secondary thread if necessary, or uninstall the handler
    // to avoid a death loop.
    if(ksthread_self() == g_primaryMachThread)
    {
    KSLOG_DEBUG("This is the primary exception thread. Activating secondary thread.");
    // TODO: This was put here to avoid a freeze. Does secondary thread ever fire?
    restoreExceptionPorts();
    if(thread_resume(g_secondaryMachThread) != KERN_SUCCESS)
    {
    KSLOG_DEBUG("Could not activate secondary thread. Restoring original exception ports.");
    }
    }
    else
    {
    KSLOG_DEBUG("This is the secondary exception thread.");// Restoring original exception ports.");
    // restoreExceptionPorts();
    }

    // Fill out crash information
    KSLOG_DEBUG("Fetching machine state.");
    KSMC_NEW_CONTEXT(machineContext);
    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    crashContext->offendingMachineContext = machineContext;
    kssc_initCursor(&g_stackCursor, NULL, NULL);
    if(ksmc_getContextForThread(exceptionMessage.thread.name, machineContext, true))
    {
    kssc_initWithMachineContext(&g_stackCursor, KSSC_MAX_STACK_DEPTH, machineContext);
    KSLOG_TRACE("Fault address %p, instruction address %p",
    kscpu_faultAddress(machineContext), kscpu_instructionAddress(machineContext));
    if(exceptionMessage.exception == EXC_BAD_ACCESS)
    {
    crashContext->faultAddress = kscpu_faultAddress(machineContext);
    }
    else
    {
    crashContext->faultAddress = kscpu_instructionAddress(machineContext);
    }
    }

    KSLOG_DEBUG("Filling out context.");
    crashContext->crashType = KSCrashMonitorTypeMachException;
    crashContext->eventID = eventID;
    crashContext->registersAreValid = true;
    crashContext->mach.type = exceptionMessage.exception;
    crashContext->mach.code = exceptionMessage.code[0] & (int64_t)MACH_ERROR_CODE_MASK;
    crashContext->mach.subcode = exceptionMessage.code[1] & (int64_t)MACH_ERROR_CODE_MASK;
    if(crashContext->mach.code == KERN_PROTECTION_FAILURE && crashContext->isStackOverflow)
    {
    // A stack overflow should return KERN_INVALID_ADDRESS, but
    // when a stack blasts through the guard pages at the top of the stack,
    // it generates KERN_PROTECTION_FAILURE. Correct for this.
    crashContext->mach.code = KERN_INVALID_ADDRESS;
    }
    crashContext->signal.signum = signalForMachException(crashContext->mach.type, crashContext->mach.code);
    crashContext->stackCursor = &g_stackCursor;

    kscm_handleException(crashContext);

    KSLOG_DEBUG("Crash handling complete. Restoring original handlers.");
    g_isHandlingCrash = false;
    ksmc_resumeEnvironment(threads, numThreads);
    }

    KSLOG_DEBUG("Replying to mach exception message.");
    // Send a reply saying "I didn't handle this exception".
    replyMessage.header = exceptionMessage.header;
    replyMessage.NDR = exceptionMessage.NDR;
    replyMessage.returnCode = KERN_FAILURE;

    mach_msg(&replyMessage.header,
    MACH_SEND_MSG,
    sizeof(replyMessage),
    0,
    MACH_PORT_NULL,
    MACH_MSG_TIMEOUT_NONE,
    MACH_PORT_NULL);

    return NULL;
    }

LLDB Debugger

__unsafe_unretained NSObject *objc = [[NSObject alloc] init];
NSLog(@”✳️✳️✳️ objc: %@”, objc)
ARC 下会发生 EXC_BAD_ACCESS 异常,这里要注意一下,只有我们关闭 xcode 的 Debug executable 选项才能收到 exc_handler 回调

Q&A

  1. Q:添加mac异常自定义处理是如何将线程和端口绑定
    在异常处理线程等待异常端口的消息
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    KSLOG_DEBUG("Installing port as exception handler.");
    kr = task_set_exception_ports(thisTask,
    mask,
    g_exceptionPort,
    (int)(EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES),
    THREAD_STATE_NONE);

    KSLOG_DEBUG("Creating primary exception thread.");
    error = pthread_create(&g_primaryPThread,git
    &attr,
    &handleExceptions,
    (void*)kThreadPrimary); -->

参考资料

MAC OS 的mach_port_t和pthread_self()

iOS Mach异常和signal信号

获取线程ID有更好的方式吗