2013-05-11 35 views
1

比方说,我有一个Strings表示基于拼字游戏板一招:"AARDV RK"搜索拼字游戏字典一个字一个空白区块(”“)字符

我有一个HashSet<String>称为dict包含整个拼字游戏字典(约180,000字!)。

如何才能使用正则表达式搜索dicts,但空白字符代表任何大写字母?

+0

到目前为止你提出了什么样的表情? – Jerry 2013-05-11 20:26:33

+1

否。正则表达式不适用于HashSet。那么,或者你必须遍历整个字典并使用正则表达式来检查每个字符串。这是非常低效的。 – nhahtdh 2013-05-11 20:33:05

回答

0

事情是这样的:

^AARDV[A-Z]RK$ 
0

由于dictionaly不包含任何东西,但大写字母,最简单的选项将是最好的:

final Pattern p = Pattern.compile("AARDV.RK")` 
for (String entry : dict) 
    if (p.matcher(entry).matches()) return entry; 
return null; 

通配符.将匹配任何字符在那个位置,这会为你节省对这个角色进行任何冗余检查的轻微惩罚。还要注意,先编译正则表达式非常重要,并且不要为每个条目重新编译它。

0

说明

而不是寻找一个随机的信,你可以只是看其是否适合玩家的瓷砖当前纸盒的所有单词。

考虑下面的正则表达式和逻辑的PowerShell示例。在这里,我使用逻辑来构建基于玩家当前拥有的切片的正则表达式。由此产生的正则表达式有两个不同的部分,匹配1和玩家托盘中每个字母的总数,第二部分匹配玩家托盘中每个字母的0和+1。所以玩家在他们的托盘中有2 A,正则表达式会尝试匹配0和3之间的A,而所有其他字母的每个字母的总数仍然需要1。这个过程对每个字母进行迭代。

所以给你的例子。如果玩家在他们的托盘中有aarvfrk,则正则表达式会查找所有字母,例如aardvark,但aardvarks也会匹配,但我们会根据字长where {$_.Length -le $($PlayerTiles.Length + 1)}过滤掉匹配。所以一个单词不能多于2个额外的瓦片存在于玩家托盘中。

接下来,我构造了一个正则表达式,用于查找玩家目前没有在托盘中找到的字词。

当然,这种特殊的逻辑可能会失败,例如如果玩家缺少拼写单词的两个字母。知道这些词可能有助于极少数情况下给定的电路板布局可能有你正在寻找的字母。您可以通过评估电路板布局来解决此问题,并将电路板上的所有单个字母包括在内,就好像它们是玩家托盘的一部分一样。此评估需要足够聪明,以识别由于电路板布局而无法使用的字母。它也可以做得足够聪明,从电路板布局中识别出多个合法使用的字符串。但所有这些都超出了你原来的问题范围。

根据您所选择的语言,你可能需要的东西,如{0,100},以取代在任何lookbehinds的*?。这是由于语言[如java]实现它的向后看,其中搜索字符串可能是未确定的大小。

源代码

$Matches = @() 
    [array]$Dictionary = @() 

    $Dictionary += 'AARDVARK' 
    $Dictionary += 'AARDVRKS' 
    $Dictionary += 'AARDVARKS' 
    $Dictionary += 'ANTHILL' 
    $Dictionary += 'JUMPING' 
    $Dictionary += 'HILLSIDE' 
    $Dictionary += 'KITTENS' 
    $Dictionary += 'LOVER' 
    $Dictionary += 'LOVE' 
    $Dictionary += 'LOVES' 
    $Dictionary += 'LOVELY' 
    $Dictionary += 'OLIVE' 
    $Dictionary += 'VOTE' 


    $PlayerTiles = "aardvrk" 

Function funBuildRegexForPlayerTiles ([string]$GivenTiles) { 

    # split the GivenTiles so each letter is seperate, and store these in a hashtable so the letter is the keyname and the number times it's seen is the value, This deduplicates each letter 
    [hashtable]$SearchForTiles = @{} 
    foreach ($Letter in $GivenTiles[0..$($GivenTiles.Length - 1)]) { 
     $SearchForTiles[$Letter] += 1 
     } # next letter 

    # build regex for tiles to match just the tiles we have 
    [string]$SameNumberRegex = "" 
    foreach ($Letter in $SearchForTiles.Keys) { 
     $SameNumberRegex += "(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))" 
     } # next letter 


    # add to the regex to include one extra letter of each type. 
    [array]$ExtraLetterRegex = @() 
    foreach ($MasterLetter in $SearchForTiles.Keys) { 
     [string]$TempRegex = "" 
     foreach ($Letter in $SearchForTiles.Keys) { 
      if ($MasterLetter -ieq $Letter) { 
       # this forces each letter to allow zero to one extra of itself in the dictionary string. This allows us to match words which would have all the other letters and none of this letter 
       $TempRegex += "(?=^([^$Letter]*?$Letter){0,$($SearchForTiles[$Letter] + 1)}(?![^$Letter]*?$Letter))" 

       } else { 
       # All the rest of these tiles on this regex section will need to have just the number of tiles the player has 
       $TempRegex += "(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))" 
       } # end if 
      } # next letter 
     $ExtraLetterRegex += $TempRegex 

     Write-Host "To match an extra '$MasterLetter': " $TempRegex 
     } # next MasterLetter 

    # put it all together 
    [array]$AllRegexs = @() 
    $AllRegexs += $SameNumberRegex 
    $AllRegexs += $ExtraLetterRegex 


    # stitch all the regexs together to make a massive regex 
    [string]$Output = $AllRegexs -join "|" 

    return $Output 
    } # end function funBuildRegexForPlayerTiles   


Function funBuildMissingLetterRegex ([string]$GivenTiles) { 
    # split the GivenTiles so each letter is seperate, and store these in a hashtable so the letter is the keyname and the number times it's seen is the value, This deduplicates each letter 
    [hashtable]$SearchForTiles = @{} 
    foreach ($Letter in $GivenTiles[0..$($GivenTiles.Length - 1)]) { 
     $SearchForTiles[$Letter] += 1 
     } # next letter 

    [array]$MissingLetterRegex = @() 
    # include any letters which do not match the current tiles 
    $MissingLetterRegex += "(?i)([^$($SearchForTiles.Keys -join '')])" 

    # build the regex to find the missing tiles 
    foreach ($Letter in $SearchForTiles.Keys) { 
     $MissingLetterRegex += "(?i)(?<=($Letter[^$Letter]*?){$($SearchForTiles[$Letter])})($Letter)" 
     } # next letter 

    [string]$Output = $MissingLetterRegex -join "|" 
    return $Output 
    } # end function 


    [string]$Regex = funBuildRegexForPlayerTiles -GivenTiles $PlayerTiles 
    Write-Host "Player tiles '$PlayerTiles'" 
    Write-Host "Regex = '$Regex'" 
    Write-Host "Matching words = " 
    $MatchedWords = $Dictionary -imatch $Regex | where {$_.Length -le $($PlayerTiles.Length + 1)} 

    [string]$MissingLetterRegex = funBuildMissingLetterRegex $PlayerTiles 
    foreach ($Word in $MatchedWords) { 
     Write-Host $Word -NoNewline 
     # find all the letters for which the player doesn't have a matching tile 
     [array]$MissingTiles = ([regex]"$MissingLetterRegex").matches($Word) | foreach { 
      Write-Output $_.Groups[0].Value 
      } # next match 
     Write-Host "`tLetters you are missing to spell this work '$($MissingTiles -join '')'" 
     } # next word 

    Write-Host ------------------------------- 

    $PlayerTiles = "OLLVE" 
    [hashtable]$SearchForTiles = @{} 

    # build regex for tiles 
    [string]$Regex = funBuildRegexForPlayerTiles -GivenTiles $PlayerTiles 


    Write-Host "Player tiles '$PlayerTiles'" 
    Write-Host "Regex = '$Regex'" 
    Write-Host 
    Write-Host "Matching words = " 
    $MatchedWords = $Dictionary -imatch $Regex | where {$_.Length -le $($PlayerTiles.Length + 1)} 

    [string]$MissingLetterRegex = funBuildMissingLetterRegex $PlayerTiles 
    foreach ($Word in $MatchedWords) { 
     Write-Host $Word -NoNewline 
     # find all the letters for which the player doesn't have a matching tile 
     [array]$MissingTiles = ([regex]"$MissingLetterRegex").matches($Word) | foreach { 
      Write-Output $_.Groups[0].Value 
      } # next match 
     Write-Host "`tLetters you are missing to spell this work '$($MissingTiles -join '')'" 
     } # next word 

息率

To match an extra 'r': (?=^([^r]*?r){0,3}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d)) 
To match an extra 'v': (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){0,2}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d)) 
To match an extra 'a': (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){0,3}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d)) 
To match an extra 'k': (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){0,2}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d)) 
To match an extra 'd': (?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){0,2}(?![^d]*?d)) 
Player tiles 'aardvrk' 
Regex = '(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){0,3}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){0,2}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){0,3}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){0,2}(?![^k]*?k))(?=^([^d]*?d){1,1}(?![^d]*?d))|(?=^([^r]*?r){1,2}(?![^r]*?r))(?=^([^v]*?v){1,1}(?![^v]*?v))(?=^([^a]*?a){1,2}(?![^a]*?a))(?=^([^k]*?k){1,1}(?![^k]*?k))(?=^([^d]*?d){0,2}(?![^d]*?d))' 
Matching words = 
AARDVARK Letters you are missing to spell this work 'A' 
AARDVRKS Letters you are missing to spell this work 'S' 
------------------------------- 
To match an extra 'O': (?=^([^O]*?O){0,2}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V)) 
To match an extra 'E': (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){0,2}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V)) 
To match an extra 'L': (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){0,3}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V)) 
To match an extra 'V': (?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){0,2}(?![^V]*?V)) 
Player tiles 'OLLVE' 
Regex = '(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){0,2}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){0,2}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){0,3}(?![^L]*?L))(?=^([^V]*?V){1,1}(?![^V]*?V))|(?=^([^O]*?O){1,1}(?![^O]*?O))(?=^([^E]*?E){1,1}(?![^E]*?E))(?=^([^L]*?L){1,2}(?![^L]*?L))(?=^([^V]*?V){0,2}(?![^V]*?V))' 

Matching words = 
LOVER Letters you are missing to spell this work 'R' 
LOVE Letters you are missing to spell this work '' 
LOVES Letters you are missing to spell this work 'S' 
LOVELY Letters you are missing to spell this work 'Y' 
OLIVE Letters you are missing to spell this work 'I' 
VOTE Letters you are missing to spell this work 'T' 

摘要

,我们正在寻找匹配我使用的是由这些块正则表达式的话,第一部分:(?=^([^$Letter]*?$Letter){1,$($SearchForTiles[$Letter])}(?![^$Letter]*?$Letter))。所有这些块由|或声明分开。

  • (?=开始零宽度断言
    • ^比赛开始字符串
    • (的创建一组要求的字符序列
    • [^$Letter]*?匹配任何字符,但我们正在寻找的零个或多个字母次非贪婪
    • $Letter匹配的字母
    • )关闭组
  • {力集团发生
    • 1至少一次
    • ,
    • $($SearchForTiles[$Letter])高达这种瓷砖的总数球员有
    • }末数量检查
  • (?!使用环顾四周,以防止任何
    • [^$Letter]*?不属于这封信
    • $Letter其次是这封信围绕这个
  • )结束的样子
  • )结束任意数量的字符零宽度断言本质上是寻找这个字母的末尾

当在一个单词中搜索丢失的字母时,我正在使用(?i)([^$($SearchForTiles.Keys -join '')]),随后是这些块的每个字母(?i)(?<=($Letter[^$Letter]*?){$($SearchForTiles[$Letter])})($Letter)。所有这些吸盘通过|或声明

分离
  • (?i)执行情况不敏感的
    • (启动组检查
    • [^不包括这些字符
    • $($SearchForTiles.Keys -join '')把每消除重复字母的玩家瓷砖组并将它们连接在一起
    • ]字符集的末尾
    • )基本上返回所有这一切都没有在玩家的托盘
  • |的或声明
  • 其次是一组像这样在播放器的托盘每个字母
  • (?i)力情况下不敏感的
  • 字母
  • (?<=开始看后面
  • (开始组的字符必须按此顺序
    • $Letter看看这封信
    • [^$Letter]后跟任何字符是不是这封信
    • *?零次或多次
    • )关闭组字符必须是按以下顺序
    • {$($SearchForTiles[$Letter])}该组在我们开始匹配缺少的字母之前,必须至少存在一次对于玩家的每一个瓷砖
    • )关闭lookbehind
  • ($Letter)匹配我们要找的字母,如果匹配,那么玩家在他们的托盘中丢失了这封信,所以这封信将被退回。
+0

我可能会通过增加过滤器来处理通配符,该过滤器将每个匹配单词的长度限制在播放器托盘中每个通配符的+1字符。而不是试图在正则表达式中包含通配符。当然,逻辑的第二部分需要更新以允许通配符。 – 2013-05-12 04:58:16