2013-04-04 49 views
2

我有很多的字符串在我的数据库(PostgreSQL的),一个例子:删除文字空间

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all 
    select 'I am fine, t h a n k you'::varchar string union all 
    select 'This is s t r a n g e text'::varchar string union all 
    select 'With c r a z y space b e t w e e n characters'::varchar string 
) 
select * from mystrings 

有没有一种方法如何,我可以用文字字符之间的空格去掉?在我的例子的结果应该是:

Hello, how are you 
I am fine, thank you 
This is strange text 
With crazy space between characters 

我开始与replace,但也有不少这样的话,字符之间的空间,我甚至不能找到他们。

因为它可能很难有意义地连接字符,所以最好只获得连接候选列表。使用示例数据,结果应该是:

H e l l o 
t h a n k 
s t r a n g e 
c r a z y 
b e t w e e n 

这样的查询应该找到并返回所有子字符串时,有两个空格隔开的至少三个独立的字符(和继续下去,直到百通[space] individual character发生):

He l l o how are you --> llo 
H e l l o how are you --> Hello 
C r a z y space b e t w e e n --> {crazy, between} 
+0

。 。它总是一个空间吗?你有一张允许用语的表格吗? – 2013-04-04 10:40:12

+0

对于我发现的情况总是有一个空间。在PostgreSQL中为英文字典提供全面的搜索支持。不知道我是否可以将其用作允许词的列表。 – 2013-04-04 10:44:38

+1

即使使用字典,这也毫无疑问含糊不清。许多单词可以连接在一起。 – 2013-04-04 11:54:56

回答

1

根据你编辑问题,下面得到所有有least three individual characters separated by two spaces

SELECT 
    data || ' --> {' || replace_candidates || '}' 
FROM(
SELECT 
    data, 
    (SELECT 
      array_to_string(array_agg(data),',') 
     FROM (
      SELECT 
       data, 
       length(data) 
      FROM ( 
       SELECT 
        replace(data, ' ', '') AS data 
       FROM 
        regexp_split_to_table(data, '\S{2,}') AS data 
       ) t 
      WHERE length(data) > 2 
     ) t) AS replace_candidates 
    FROM 
     mystrings 
) T 
WHERE 
    replace_candidates IS NOT NULL 

工作

可能的候选者

开始寻找最内层查询第一(带有regexp_split_to_table

  1. regexg(用空格不separated)获取具有2 characters in a sequence所有字符串
  2. regexp_split_to_table获得了比赛的倒数,更在其上here
  3. empty char替换空间和具有length greater than 2

扩孔是过滤个功能照顾formatting,按照您的要求,更本here

结果

H e l l o how are you --> {Hello} 
I am fine, t h a n k you --> {thank} 
This is s t r a n g e text --> {strange} 
With c r a z y space b e t w e e n characters --> {crazy,between} 
SOME MORE TEST T E X T --> {TEXT} 

SQLFIDDLE

注:它认为它落入作为[space][char][space]字符,但,您可以修改它以适应您的需求[space][space][char][space][space][char][special_char][space] ...

希望这有助于; p

0

您可以使用资源,如在线词典,如果该单词存在,那么你不必删除空格,否则删除空格,或者你可以使用一个表,你必须把所有的字符串存在,然后你必须检查希望你明白我的观点。

+0

我不确定这会有帮助 - 有很多单词有一个或两个字符。我想我必须首先删除某些空格,然后可能会匹配字典。 – 2013-04-04 10:53:07

+0

对!因为'a'可以是单个文章,也可以是单词的一部分。 – 2013-04-04 11:07:48

0

下找到可以串接候选人:

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all 
    select 'I am fine, t h a n k you'::varchar string union all 
    select 'This is s t r a n g e text'::varchar string union all 
    select 'With c r a z y space b e t w e e n characters'::varchar string 
) 

, u as (
select string, strpart[rn] as strpart, rn 
from (
    select *, generate_subscripts(strpart, 1) as rn 
    from (
     select string, string_to_array(replace(string,',',''), ' ') as strpart 
     from mystrings 
    ) x 
    ) y 
) 

,w as (
select 
    string,strpart,rn, 
    case when length(strpart) = 1 then 1 else 0 end as indchar , 
    case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart, 
    case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend 
from u 
) 


,x as (
    select 
     string,rn,strpart,indchar,strstart, 
     sum(strstart) over (order by string, rn) as strid 
    from w 
    where indchar = 1 and not (strstart = 1 and strend = 1) 
    ) 

select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid