2016-07-08 130 views
0

我正在处理一堆引用图像文件名的文本文件。这些文件名被消毒(使小写字母和空格替换为连字符) - 但引用它们的文本不是。正则表达式:匹配指定字符串之间的所有匹配项

我需要转换的字符串是这样的:

(image: uploaded IMAGE.jpg caption: this is my caption) 
(image: uploaded IMAGE copy.jpeg caption: this is my caption) 
(image: IMG_6087.png caption: this is my caption) 
(image: IMG_6087 copy.gif) 
(image: IMG_9999_copy.jpg) 
(image: somehow, a comma.jpg) 
(image: other ridic'ulous characters!.jpg) 

到:

(image: uploaded-image.jpg caption: this is my caption) 
(image: uploaded-image-copy.jpeg caption: this is my caption) 
(image: img_6087.png caption: this is my caption) 
(image: img_6087-copy.gif) 
(image: img_9999_copy.jpg) 
(image: somehow-a-comma.jpg) 
(image: other-ridiculous-characters.jpg) 

这些字符串较大的文本块的部分,但都是在他们自己的线路,像这样:

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: manhattan photo.jpg) 

Drive till sunset and say goodbye to your body, because this is not a photograph. I saw sixteen americans, raised by wolves, probably lost in paradise city. I found your head — Do you still want it? 

我正在使用Sublime文本,并计划做多个替换所有:

  1. 带空格
  2. 条字符不是字母数字或_或 -
  3. 变为小写

但我不能设法捕捉的两个分隔符之间的东西所有实例。

(?<=^\(image:)[what do I do here??](?=\.jpe?g|png|gif)

回答

0

可以使用非贪婪匹配所有.*?

所以^\(image: (.*?\.(:?jpe?g|png|gif))捕捉到的文件名,包括扩展名

+0

这会捕获整个文件名,所以我可以使用搜索将其设置为小写:'(?<=^\(image:)(。*?)(?= \。jpe?g | png | gif)'替换:'\ L $ 1' - 这样可以解决第3步 - 但是如何找到并用连字符替换所有空格? –

-1

你可以尝试Jetbrains的webstrom前端IDE。它提供了很多能够以可读方式实现任何正则表达式操作的功能。选择你想要分割的文本是检查分隔符或任何空格。

您将获得30天的足迹版本。也将很快分享你的正则表达式查询。

也会检出http://myregexp.com/或某些插件有效的正则表达式查询

Online Regex editor

0

你可以抓住的文件名用:

(?<=image:\s)([^.]++)(?=\.jpe?g|\.png|\.gif) 

之后,转换取决于语言,你”重新工作。根据需要添加文件扩展名。现在您支持jpg,jpeg,pnggif

0

这里是工作的方式做到这一点在PHP

<?php 
$string = 
"This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded IMAGE.jpg caption: this is my caption) 
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded IMAGE copy.jpeg caption: this is my caption) 
(image: IMG_6087.png caption: this is my caption) 
(image: IMG_6087 copy.gif) blah blah 
(image: IMG_9999_copy.jpg) 
(image: somehow, a comma.jpg) 
(image: other ridic'ulous characters!.jpg)"; 

echo preg_replace_callback('~(?<=\(image:)(.*?)\.(jpg|jpeg|png|gif)~', function($matches) 
{ 
    return preg_replace('~\W~', '-', stripslashes(strtolower($matches[1]))) . ".$matches[2]"; 
}, $string); 

?> 

[编辑]加正则表达式的解释:

  • (?<=image:):是一个积极的回顾后 - 因此检查存在的形象:“但没有捕获。
  • (.*?):以贪婪的方式捕捉图片扩展之前的所有内容 - 尽可能少地匹配文本。
  • \.(jpg|jpeg|png|gif):将匹配.字面+给定的扩展之一 - 捕获扩展以重用。
  • ~:是分隔符,这种选择只是因为它是在字符串很少使用,不需要\/
  • \W:是的\w相反,它会匹配任何非字母数字字符。

将输出(在查看源代码):

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded-image.jpg caption: this is my caption) 
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded-image-copy.jpeg caption: this is my caption) 
(image: img_6087.png caption: this is my caption) 
(image: img_6087-copy.gif) blah blah 
(image: img_9999_copy.jpg) 
(image: somehow--a-comma.jpg) 
(image: other-ridic-ulous-characters-.jpg) 

然后,您可以微调在你想变成什么什么性格的回调,与str_replace()函数为例。

希望它有帮助! ;)

相关问题