你的问题写得不好。请改进它。按目前的格式,它将被封闭为“太模糊”。
是否要筛选电子邮件或网站?你的例子是关于网站,你关于电子邮件的文字。因为我不知道,我决定尽力帮助你,所以我决定做这两件事。
这里去代码:
private static final Pattern EMAIL_REGEX =
Pattern.compile("[A-Za-z0-9](:?(:?[_\\.\\-]?[a-zA-Z0-9]+)*)@(:?[A-Za-z0-9]+)(:?(:?[\\.\\-]?[a-zA-Z0-9]+)*)\\.(:?[A-Za-z]{2,})");
private static final Pattern WEBSITE_REGEX =
Pattern.compile("http(:?s?)://[_#\\.\\-/\\?&=a-zA-Z0-9]*");
public static String readFileAsString(String fileName) throws IOException {
File f = new File(fileName);
byte[] b = new byte[(int) f.length()];
InputStream is = null;
try {
is = new FileInputStream(f);
is.read(b);
return new String(b, "UTF-8");
} finally {
if (is != null) is.close();
}
}
public static List<String> filterEmails(String everything) {
List<String> list = new ArrayList<String>(8192);
Matcher m = EMAIL_REGEX.matcher(everything);
while (m.find()) {
list.add(m.group());
}
return list;
}
public static List<String> filterWebsites(String everything) {
List<String> list = new ArrayList<String>(8192);
Matcher m = WEBSITE_REGEX.matcher(everything);
while (m.find()) {
list.add(m.group());
}
return list;
}
要确保它的工作原理,首先让测试filterEmails和filterWebsites方法:
public static void main(String[] args) {
System.out.println(filterEmails("Orange, pizza whatever else [email protected] a lot of text here. Blahblah blah with Luke Skywalker ([email protected]) hfkjdsh fhdsjf jdhf Paulo <[email protected]>"));
System.out.println(filterWebsites("Orange, pizza whatever else [email protected] a lot of text here. Blahblah blah with Luke Skywalker (http://luke.starwars.com/force) hfkjdsh fhdsjf jdhf Paulo <https://darth.vader/blackside?sith=true&midclorians> And the http://www.somewhere.com as x."));
}
它输出:
[[email protected], [email protected], [email protected]]
[http://luke.starwars.com/force, https://darth.vader/blackside?sith=true&midclorians, http://www.somewhere.com]
测试readFileAsString方法:
public static void main(String[] args) {
System.out.println(readFileAsString("C:\\The_Path_To_Your_File\\SomeFile.txt"));
}
如果该文件存在,它的内容将被打印。
如果你不喜欢的事实,它返回List<String>
不是与项目之间用空格分隔一String
,这是简单的解决:
public static String collapse(List<String> list) {
StringBuilder sb = new StringBuilder(50 * list.size());
for (String s : list) {
sb.append(" ").append(s);
}
sb.delete(0, 1);
return sb.toString();
}
坚持一起:
String fileName = ...;
String webSites = collapse(filterWebsites(readFileAsString(fileName)));
String emails = collapse(filterEmails(readFileAsString(fileName)));
**为什么**在单个字符串中,而不是固定大小的'String []'每个索引有一个链接,或者一个动态的'java.util.List'? – jlordo
你不需要转义*正向*斜杠,只有反斜杠需要转义。 – dasblinkenlight
你在这里“逃避”是什么意思?你的意思是前缀与协议(即添加'“http://”'到'“网站网站的网站”?“(因为@dasblinkenlight说,如果你已经有''http:// website”',它不需要任何转义,如插入转义字符,如'\'。) – Amadan