删除文字空间

我有很多的字符串在我的数据库（PostgreSQL的），一个例子：删除文字空间

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all 
    select 'I am fine, t h a n k you'::varchar string union all 
    select 'This is s t r a n g e text'::varchar string union all 
    select 'With c r a z y space b e t w e e n characters'::varchar string 
) 
select * from mystrings

有没有一种方法如何，我可以用文字字符之间的空格去掉？在我的例子的结果应该是：

Hello, how are you 
I am fine, thank you 
This is strange text 
With crazy space between characters

我开始与replace，但也有不少这样的话，字符之间的空间，我甚至不能找到他们。

因为它可能很难有意义地连接字符，所以最好只获得连接候选列表。使用示例数据，结果应该是：

H e l l o 
t h a n k 
s t r a n g e 
c r a z y 
b e t w e e n

这样的查询应该找到并返回所有子字符串时，有两个空格隔开的至少三个独立的字符（和继续下去，直到百通[space] individual character发生）：

He l l o how are you --> llo 
H e l l o how are you --> Hello 
C r a z y space b e t w e e n --> {crazy, between}

来源

2013-04-04 Tomas Greif

。。它总是一个空间吗？你有一张允许用语的表格吗？ – 2013-04-04 10:40:12

对于我发现的情况总是有一个空间。在PostgreSQL中为英文字典提供全面的搜索支持。不知道我是否可以将其用作允许词的列表。 – 2013-04-04 10:44:38

即使使用字典，这也毫无疑问含糊不清。许多单词可以连接在一起。 – 2013-04-04 11:54:56

根据你编辑问题，下面得到所有有least three individual characters separated by two spaces

SELECT 
    data || ' --> {' || replace_candidates || '}' 
FROM(
SELECT 
    data, 
    (SELECT 
      array_to_string(array_agg(data),',') 
     FROM (
      SELECT 
       data, 
       length(data) 
      FROM ( 
       SELECT 
        replace(data, ' ', '') AS data 
       FROM 
        regexp_split_to_table(data, '\S{2,}') AS data 
       ) t 
      WHERE length(data) > 2 
     ) t) AS replace_candidates 
    FROM 
     mystrings 
) T 
WHERE 
    replace_candidates IS NOT NULL

工作

可能的候选者

开始寻找最内层查询第一（带有regexp_split_to_table）

的regexg（用空格不separated）获取具有2 characters in a sequence所有字符串
regexp_split_to_table获得了比赛的倒数，更在其上here
由empty char替换空间和具有length greater than 2

扩孔是过滤个功能照顾formatting，按照您的要求，更本here

结果

H e l l o how are you --> {Hello} 
I am fine, t h a n k you --> {thank} 
This is s t r a n g e text --> {strange} 
With c r a z y space b e t w e e n characters --> {crazy,between} 
SOME MORE TEST T E X T --> {TEXT}

SQLFIDDLE

注：它认为它落入作为[space][char][space]字符，但，您可以修改它以适应您的需求[space][space][char][space]或[space][char][special_char][space] ...

希望这有助于; p

来源

2013-04-05 18:38:56 Akash

您可以使用资源，如在线词典，如果该单词存在，那么你不必删除空格，否则删除空格，或者你可以使用一个表，你必须把所有的字符串存在，然后你必须检查希望你明白我的观点。

来源

2013-04-04 10:44:26

我不确定这会有帮助 - 有很多单词有一个或两个字符。我想我必须首先删除某些空格，然后可能会匹配字典。 – 2013-04-04 10:53:07

对！因为'a'可以是单个文章，也可以是单词的一部分。 – 2013-04-04 11:07:48

下找到可以串接候选人：

with mystrings as (
    select 'H e l l o, how are you'::varchar string union all 
    select 'I am fine, t h a n k you'::varchar string union all 
    select 'This is s t r a n g e text'::varchar string union all 
    select 'With c r a z y space b e t w e e n characters'::varchar string 
) 

, u as (
select string, strpart[rn] as strpart, rn 
from (
    select *, generate_subscripts(strpart, 1) as rn 
    from (
     select string, string_to_array(replace(string,',',''), ' ') as strpart 
     from mystrings 
    ) x 
    ) y 
) 

,w as (
select 
    string,strpart,rn, 
    case when length(strpart) = 1 then 1 else 0 end as indchar , 
    case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart, 
    case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend 
from u 
) 


,x as (
    select 
     string,rn,strpart,indchar,strstart, 
     sum(strstart) over (order by string, rn) as strid 
    from w 
    where indchar = 1 and not (strstart = 1 and strend = 1) 
    ) 

select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid

来源

2013-04-04 13:21:09

删除文字空间

回答

相关问题