删除URL中第一个斜杠之前的所有内容？

使用正则表达式，我怎么能在URL中的第一个路径/之前删除所有内容？删除URL中第一个斜杠之前的所有内容？

实例网址：https://www.example.com/some/page?user=1&[email protected]

从这一点，我只是想/some/page?user=1&[email protected]

在这只是根域（即https://www.example.com/）的话，那么我只想/归还。

该域可能有也可能没有子域，它可能有也可能没有安全协议。真的最终只是想在第一个路径斜线之前去掉什么。

如果它很重要，我运行Ruby 1.9.3。

2013-07-18 Shpigford

**正则表达式并不是一种魔术棒，它会在涉及到字符串的每一个问题上都发挥作用。**您可能想要使用已经编写，测试和调试的现有代码。在PHP中，使用['parse_url']（http://php.net/manual/en/function.parse-url.php）函数。 Perl：['URI' module]（http://search.cpan.org/dist/URI/）。 Ruby：['URI'' module]（http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html）。 .NET：['Uri'class]（http://msdn.microsoft.com/en-us/library/txt7706a.aspx） –

请勿为此使用正则表达式。使用URI类。你可以写：

require 'uri' 

u = URI.parse('https://www.example.com/some/page?user=1&[email protected]') 
u.path #=> "/some/page" 
u.query #=> "user=1&[email protected]" 

# All together - this will only return path if query is empty (no ?) 
u.request_uri #=> "/some/page?user=1&[email protected]"

来源

2013-07-18 21:29:46

+1你打我3分钟:) – Tilo

require 'uri' 

uri = URI.parse("https://www.example.com/some/page?user=1&[email protected]") 

> uri.path + '?' + uri.query 
    => "/some/page?user=1&[email protected]"

由于加文还提到，它不是使用正则表达式的一个很好的想法，虽然这是很有诱惑力。您可能有URL中包含特殊字符，甚至包含UniCode字符，您在编写RegExp时并不期待这些字符。这可能会发生在您的查询字符串中。使用URI库是更安全的方法。

来源

2013-07-18 21:32:48 Tilo

可使用String#index

索引来完成相同的（子串[，偏移]）

str = "https://www.example.com/some/page?user=1&[email protected]" 
offset = str.index("//") # => 6 
str[str.index('/',offset + 2)..-1] 
# => "/some/page?user=1&[email protected]"

来源

2013-07-18 22:04:18

我强烈与使用URI模块在这种情况下，建议同意，而我并不认为自己擅长正则表达。尽管如此，证明一种可能的方式来做你所要求的东西似乎是值得的。

test_url1 = 'https://www.example.com/some/page?user=1&[email protected]' 
test_url2 = 'http://test.com/' 
test_url3 = 'http://test.com' 

regex = /^https?:\/\/[^\/]+(.*)/ 

regex.match(test_url1)[1] 
# => "/some/page?user=1&[email protected]" 

regex.match(test_url2)[1] 
# => "/" 

regex.match(test_url3)[1] 
# => ""

注意，在后一种情况下，该URL没有尾随'/'所以结果是空字符串。

正则表达式（/^https?:\/\/[^\/]+(.*)/）表示的字符串（^）http（http）开始，任选接着进行s（s?），接着随后在至少一个非斜杠字符（[^\/]+）://（:\/\/），之后是零个或多个字符，我们希望捕获这些字符（(.*)）。

我希望你能找到这样的例子和解释教育，我再次建议不要在这种情况下实际使用正则表达式。 URI模块使用起来更简单，而且更加健壮。

来源

2013-07-19 06:17:14

删除URL中第一个斜杠之前的所有内容？

回答

相关问题