有 2 个一对多相关的表。
CREATE TABLE main (
id INTEGER,
filter_c TEXT
);
INSERT INTO main (id, filter_c) VALUES
(1, 'data1'),
(2, 'data2');
CREATE TABLE feature (
main_id INTEGER,
mark_c TEXT
);
INSERT INTO feature (main_id, mark_c) VALUES
(1, 'mark1'),
(1, 'mark1'),
(1, 'mark2'),
(1, null);
注册必要的索引。表的大小分别为7k和2m。预计增长3-4倍。重单位与获取统计数据相关:
SELECT main_id, sum(amount) as total, mark_c as best, max(marked) as goods, sum(marked) - max(marked) as bads
FROM (
SELECT main_id, mark_c,
count(*) as amount, --включая Null
count(mark_c) as marked --исключая Null
FROM feature
GROUP BY main_id, mark_c
)
GROUP BY main_id
整个表的执行时间约为 7 秒。但这几乎没有必要。最多main会过滤 10 条记录,您需要从中获取统计数据。预期获取方式
SELECT id, filter_c,
(SELECT main_id, sum(amount) as total, mark_c as best, max(marked) as goods, sum(marked) - max(marked) as bads
FROM (
SELECT main_id, mark_c,
count(*) as amount, --включая Null
count(mark_c) as marked --исключая Null
FROM feature
WHERE main_id = id --ПРЕФИЛЬТР
GROUP BY main_id, mark_c
)
GROUP BY main_id)
FROM main
WHERE filter_c = 'data1'
但是您不能在这样的子查询中返回多个字段。还有哪些其他方法可以仅收集您需要的记录的统计信息?
你可以试试这个
如果特征表上有 (main_id,mark_c) 索引,它应该可以正常工作。