如何从URL中剪切文件名？

我有很多像这样的链接http://example.com/2013/1520/i2013i1520p100049.html或http://example.com/2013/89/i2013i89p60003.html。如何从URL中剪切文件名？

我需要将HTML文件分别保存在文件夹1520中作为i2013i1520p100049.html和文件夹“89”中的文件作为i2013i89p60003.html。

我可以削减字符串，但其他人有另一个长度。

P.S.我正在使用Python。

来源

2013-07-26 Andrew Tsaryov

所以使用这种标准化的格式最快的方法是使用查找和切片:)。正则表达式是不值得的

例如，

>>> a = "http://example.com/2013/1520/i2013i1520p100049.html or http://example.com/2013/89/i2013i89p60003.html" 
>>> lastindex = a.rfind('/') 
>>> a[lastindex+1:] 
'i2013i89p60003.html' 
>>> a[a.rfind('/',0,lastindex)+1:lastindex] 
'89'

分裂VS发现一个巨大的网址（这些是存在的，但通常不这大）

>>> a = range(10000) 
>>> [a.insert(randint(0,10000),'/') for x in range(0,100)] 
>>> a = str(a) 
>>> b = time.time(); a.rfind('/'); time.time()-b 
58493 
1.8835067749023438e-05 
>>> b = time.time(); d=a.split('/'); time.time()-b 
0.00012683868408203125

更重要的是，你不需要做出的一个巨大的再分配/复制您的列表，当你有1000的，这并不好玩URL的

来源

2013-07-26 20:51:29

您可以使用类似以下的（如果你想要做的更复杂的工作）：

s = 'http://example.com/2013/1520/i2013i1520p100049.html' 

from operator import itemgetter 
from urlparse import urlsplit 

split_url = urlsplit(s) 
path, fname = itemgetter(2, -1)(split_url.path.split('/')) 
print path, fname 
# 1520 i2013i1520p100049.html

否则：

path, fname = s.rsplit('/', 2)[1:]

来源

2013-07-26 20:52:14

使用split()

url = 'http://example.com/2013/1520/i2013i1520p100049.html' 
parts = url.split('/') 

fn = parts[-1] 
dir = parts[-2]

然后拨打电话，保存源：

import urllib2 

fp = urllib2.urlopen(url).read() 

fullpath_fn = dir + '/' + fn 
with open(fullpath, 'w') as htmlfile: 
    htmlfile.write(fp)

来源

2013-07-26 20:52:56 That1Guy

>>> 'http://example.com/2013/1520/i2013i1520p100049.html'.split('/')[-1] 
'i2013i1520p100049.html'

来源

2013-07-26 20:53:43

您可以使用该方法split()：

url = 'http://example.com/2013/1520/i2013i1520p100049.html' 
tokens = url.split('/') 
file = parts[-1] 
folder = parts[-2]

来源

2013-07-26 20:54:12 amatellanes

你可以使用urlparse.urlsplit和os.path.split：

import os 
import urlparse 
s = 'http://example.com/2013/1520/i2013i1520p100049.html' 

path = urlparse.urlsplit(s).path 
print(path) 
# /2013/1520/i2013i1520p100049.html 

dirname, basename = os.path.split(path) 
dirname, basedir = os.path.split(dirname) 
print(basedir) 
# 1520 
print(basename) 
# i2013i1520p100049.html

来源

2013-07-26 20:56:58 unutbu

只是为了它的缘故，基于正则表达式回答：

match = re.search(r'([0-9]+)/([a-z0-9]+\.html)$', string) 
if match: 
    folder = match.group(1) 
    file = match.group(2)

来源

2013-07-26 23:10:45 adbar

如何从URL中剪切文件名？

回答

相关问题