0%

与TLE斗争到底系列(一):堆的应用

对我来说,Google Kick Start 就是一条充满荆棘的山路,无处不在的WATLE整得人简直没脾气。接下来一段时间,我打算在这条路上好好走上一走,四处看看,与TLE斗争到底。一般来说,TLE发生的原因即在于时间复杂度过高,需要想办法优化代码,其中一个思路是借助堆的应用,之前的也有文章提到,本文从一道更难的题目入手,一起来看看如何借助堆优化时间复杂度。

H-index

It is important for researchers to write many high quality academic papers. Jorge has recently discovered a way to measure how impactful a researcher’s papers are: the H-index.

The H-index score of a researcher is the largest integer h such that the researcher has h papers with at least h citations each.

Jorge has written N papers in his lifetime. The i-th paper has Ai citations. The number of citations that each paper has will never change after it is written. Please help Jorge determine his H-index score after each paper he wrote.

Simple Test Cases

Input

The first line of the input gives the number of test cases, T. T test cases follow. Each test case begins with a line containing N, the number of papers Jorge wrote.

The second line contains N integers. The i-th integer is Ai, the number of citations the i-th paper has.

1
2
3
4
5
2
3
5 1 2
6
1 3 3 2 2 15

Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is a space-separated list of integers. The i-th integer is the H-index score after Jorge wrote his i-th paper.

1
2
Case #1: 1 1 2
Case #2: 1 1 2 2 2 3
Limit

Time limit: 50 seconds per test set.
Memory limit: 1GB.

$1 \leq T \leq 100$

$1 \leq A_i \leq 105$

Test set 1 (Visible)

$1 \leq N \leq 1000$

Test set 2 (Hidden)

$1 \leq N \leq 10^5$


首先,需要了解H-Index的概念:H-Index又称为H指数或H因子,是一种评价学术成就的新方法。H代表“高引用次数”(high citations),一名科研人员的h指数是指他至多有H篇论文分别被引用了至少H次。如Simple Test Case #1 所示,当科研人员发了第一篇论文,引用次数为5,那么其H因子为1,解释为其至多发表了1(paper #1)篇论文且至少被引用了1次;当发表了第二篇论文,引用次数为1,那么其H因子依旧为1,因为其至多发表了1篇论文(paper #1 or paper #2)且至少被引用了1次;当发表了第三篇论文,引用次数为2,那么其H因子为2,因为其至多发表了2篇论文(paper #1 and paper #2)且至少被引用了2次。

那么,从直觉上,我们可以借助暴力枚举的方式,当科研人员发表了第k篇论文,寻找:$h-index(k) = max_k min(f(k), k)$。如此,其时间复杂度为$O(n^2)$,当然,其对于简单测试集Test set 1是够用的,但对于复杂测试集Test set 2来说便会超时,因此必须换个思路。

回顾题意,对于第k篇文章,我们需要找到m篇文章,且其引用次数>=m,同时,需要注意的是,H-Index总体上与发表文章数量呈正比,因此,我们可以设计一个小根堆来存储发表文章的引用次数,当堆顶元素小于当前H-Index时,移除该堆顶元素;当小根堆的容量依旧大于当前H-Index时,则当前H-Index++。由此,我们仅仅扫描一遍文章引用次数数组,其时间复杂度为$O(n)$。代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <vector>
#include <queue>

using namespace std;

vector<int> compute(vector<int> &papers) {
vector<int> indice(papers.size(), 1);

priority_queue<int, vector<int>, greater<>> heap;
heap.push(papers[0]);

for (int i = 1; i < papers.size(); ++i) {
heap.push(papers[i]);

int current = indice[i - 1];
while (heap.top() <= current && !heap.empty()) {
heap.pop();
}

if (heap.size() > current) {
current = current + 1;
}


indice[i] = current;
}

return indice;
}

int main() {
int T;

cin >> T;
for (int t = 0; t < T; ++t) {
int N;
vector<int> papers, h_indice;

cin >> N;
for (int n = 0; n < N; ++n) {
int c;
cin >> c;

papers.push_back(c);
}

h_indice = compute(papers);

cout << "Case #" << t + 1 << ": ";
for (auto &h_index : h_indice) {
cout << h_index << " ";
}
cout << endl;
}
}

以上!