与TLE斗争到底系列（一）：堆的应用

对我来说，Google Kick Start 就是一条充满荆棘的山路，无处不在的WA和TLE整得人简直没脾气。接下来一段时间，我打算在这条路上好好走上一走，四处看看，与TLE斗争到底。一般来说，TLE发生的原因即在于时间复杂度过高，需要想办法优化代码，其中一个思路是借助堆的应用，之前的也有文章提到，本文从一道更难的题目入手，一起来看看如何借助堆优化时间复杂度。

H-index

It is important for researchers to write many high quality academic papers. Jorge has recently discovered a way to measure how impactful a researcher’s papers are: the H-index.

The H-index score of a researcher is the largest integer h such that the researcher has h papers with at least h citations each.

Jorge has written N papers in his lifetime. The i-th paper has Ai citations. The number of citations that each paper has will never change after it is written. Please help Jorge determine his H-index score after each paper he wrote.

Simple Test Cases

Input

The first line of the input gives the number of test cases, T. T test cases follow. Each test case begins with a line containing N, the number of papers Jorge wrote.

The second line contains N integers. The i-th integer is Ai, the number of citations the i-th paper has.
1
2
3
4
5
2
3
5 1 2
6
1 3 3 2 2 15
Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is a space-separated list of integers. The i-th integer is the H-index score after Jorge wrote his i-th paper.
1
2
Case #1: 1 1 2
Case #2: 1 1 2 2 2 3

Limit

Time limit: 50 seconds per test set.
Memory limit: 1GB.

$1 \leq T \leq 100$

$1 \leq A_i \leq 105$

Test set 1 (Visible)

$1 \leq N \leq 1000$

Test set 2 (Hidden)

$1 \leq N \leq 10^5$

首先，需要了解H-Index的概念：H-Index又称为H指数或H因子，是一种评价学术成就的新方法。H代表“高引用次数”（high citations），一名科研人员的h指数是指他至多有H篇论文分别被引用了至少H次。如Simple Test Case #1 所示，当科研人员发了第一篇论文，引用次数为5，那么其H因子为1，解释为其至多发表了1（paper #1）篇论文且至少被引用了1次；当发表了第二篇论文，引用次数为1，那么其H因子依旧为1，因为其至多发表了1篇论文（paper #1 or paper #2）且至少被引用了1次；当发表了第三篇论文，引用次数为2，那么其H因子为2，因为其至多发表了2篇论文（paper #1 and paper #2）且至少被引用了2次。

那么，从直觉上，我们可以借助暴力枚举的方式，当科研人员发表了第k篇论文，寻找：$h-index(k) = max_k min(f(k), k)$。如此，其时间复杂度为$O(n^2)$，当然，其对于简单测试集Test set 1是够用的，但对于复杂测试集Test set 2来说便会超时，因此必须换个思路。

回顾题意，对于第k篇文章，我们需要找到m篇文章，且其引用次数>=m，同时，需要注意的是，H-Index总体上与发表文章数量呈正比，因此，我们可以设计一个小根堆来存储发表文章的引用次数，当堆顶元素小于当前H-Index时，移除该堆顶元素；当小根堆的容量依旧大于当前H-Index时，则当前H-Index++。由此，我们仅仅扫描一遍文章引用次数数组，其时间复杂度为$O(n)$。代码如下：

#include <iostream>
#include <vector>
#include <queue>

using namespace std;

vector<int> compute(vector<int> &papers) {
    vector<int> indice(papers.size(), 1);
    
    priority_queue<int, vector<int>, greater<>> heap;
    heap.push(papers[0]);
    
    for (int i = 1; i < papers.size(); ++i) {
        heap.push(papers[i]);
        
        int current = indice[i - 1];
        while (heap.top() <= current && !heap.empty()) {
            heap.pop();
        }
        
        if (heap.size() > current) {
            current = current + 1;
        }
        

        indice[i] = current;
    }

    return indice;
}

int main() {
    int T;

    cin >> T;
    for (int t = 0; t < T; ++t) {
        int N;
        vector<int> papers, h_indice;

        cin >> N;
        for (int n = 0; n < N; ++n) {
            int c;
            cin >> c;

            papers.push_back(c);
        }

        h_indice = compute(papers);

        cout << "Case #" << t + 1 << ": ";
        for (auto &h_index : h_indice) {
            cout << h_index << " ";
        }
        cout << endl;
    }
}

以上！