2015-10-29 36 views
1

我构建了一个简单的问答系统。SQL:如何查询一组数据并计算给定字符串列表中匹配的字符串数量

在我的数据库中,有三个表:

question (
    id   int 
    question varchar(200) 
    answer_id int /* foreign key mapping to answer.id */ 
); 

answer (
    id int 
    answer varchar(500) 
) 

question_elements (
    id int 
    seq int /*vocabulary in question location */ 
    question_id int /** foreign key mapping to question.id */ 
    vocabulary varchar(40) 
) 

现在我有一个问题:

What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ? 
表问题

因此,记录是:

question { 
    id: 1, 
    question:"What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?", 
    answer_id:1 
} 

在表question_elements

question_elements [ 
    { 
    id: 1, 
    seq: 1, 
    question_id: 1, 
    vocabulary: "what" 
    }, 
    { 
    id: 2, 
    seq: 2, 
    question_id: 1, 
    vocabulary: "approach" 
    }, 
    { 
    id: 3, 
    seq: 3, 
    question_id: 1, 
    vocabulary: "should" 
    }, 
    { 
    id: 4, 
    seq: 4, 
    question_id: 1, 
    vocabulary: "a" 
    }, 
    { 
    id: 5, 
    seq: 5, 
    question_id: 1, 
    vocabulary: "company" 
    }, 
    { 
    id: 6, 
    seq: 6, 
    question_id: 1, 
    vocabulary: "adopt" 
    }, 
    { 
    id: 7, 
    seq: 7, 
    question_id: 1, 
    vocabulary: "when" 
    }, 
    .... 
    .... 
    { 
    id: 19, 
    seq: 19, 
    question_id: 1, 
    vocabulary: "get" 
    }, 
    { 
    id: 20, 
    seq: 20, 
    question_id: 1, 
    vocabulary: "funding" 
    } 
] 

现在,当用户输入:

What action does a company should do when it wanna get more funding with high debt ratio 

我的想法是为了通过给上述计算表question_elements匹配的字符串上面的语句分割成一个字符串列表,并执行一个SQL查询字符串列表。

什么是PostgreSQL中的SQL语句?

+0

你是使用json字段还是你向我们显示数据的方式? –

+0

看起来你有两个问题。一个是用'“”执行分割,另一个是查看有多少匹配。 –

+0

[在Postgres中将列拆分成多行]可能的副本(http://stackoverflow.com/questions/29419993/split-column-into-multiple-rows-in-postgres) –

回答

0

如果我没有理解好了,你想是这样的:

WITH answer AS (
    SELECT 
     'What action does a company should do when it wanna get more funding' AS a 
), 
question AS (
    SELECT 'what' AS q 
    UNION ALL SELECT 'should' 
    UNION ALL SELECT 'a' 
    UNION ALL SELECT 'company' 
    UNION ALL SELECT 'do' 
    UNION ALL SELECT 'when' 
) 
SELECT COUNT(result) 
FROM (
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result 
    FROM answer 
) AS tbaux 
WHERE result IN (select CAST(q AS VARCHAR) FROM question); 

没有文字大写和一些解释:

SELECT COUNT(result) 
FROM (            --count how many lines have in the subquery 
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result  --this break the user input in one word per line, excluding ' ' 
    FROM answer 
) AS tbaux                 --name of the subquery 
WHERE upper(result) IN (select upper(CAST(q AS VARCHAR)) FROM question); --upper turns lowercase letters in uppercase, only the line who match will remain to the COUNT() 

这个统计有多少的话,从用户输入的问题表(在你的情况下question_elements

http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/4095/0

0

question_elements表是没有必要的。

with ui(ui) as (
    values ('What action does a company should do when it wanna get more funding with high debt ratio') 
) 
select id, count(*) as matches, question 
from 
    (
     select id, question, regexp_split_to_table(question, '\s+') as word 
     from question 
    ) q 
    inner join 
    regexp_split_to_table((select ui from ui), '\s+') ui(word) using (word) 
group by 1, 3 
order by matches desc 
相关问题