使用Wget的Shell脚本 - 如果其他嵌套在循环中

我正在尝试使shell脚本读取下载URL列表以查找它们是否仍处于活动状态。我不确定我当前的脚本有什么问题，（我是新手），任何指针都会有很大的帮助！使用Wget的Shell脚本 - 如果其他嵌套在循环中

用户@ PC：〜/测试＃猫sites.list

http://www.google.com/images/srpr/logo3w.png 
http://www.google.com/doesnt.exist 
notasite

脚本：

#!/bin/bash 
for i in `cat sites.list` 
do 
wget --spider $i -b 
if grep --quiet "200 OK" wget-log; then 
echo $i >> ok.txt 
else 
echo $i >> notok.txt 
fi 
rm wget-log 
done

不变，那么脚本输出一切notok.txt - （第一款谷歌网站应去ok.txt）。但是，如果我运行：

wget --spider http://www.google.com/images/srpr/logo3w.png -b

然后执行：

grep "200 OK" wget-log

这里grep没有任何问题的字符串。我用语法做了什么noob错误？感谢m8s！

来源

2012-10-24 el-noobador

-b选项将wget发送到后台，所以您在wget完成之前正在执行grep。

尝试没有-b选项：

if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then

来源

2012-10-24 02:47:52

好赶！ +1 – Graham

确实。我也是+1。 :) – ghoti

作品！谢谢！ –

有你在做什么的几个问题。

您的for i in会在包含空格的行中出现问题。最好使用while read来读取文件的各个行。
你没有引用你的变量。如果文件中的一行（或一行中的单词）以连字符开头，会怎么样？那么wget会将其解释为一个选项。这里有潜在的安全风险，也有错误。
创建和删除文件并不是必须的。如果您所做的只是检查URL是否可访问，您可以在没有临时文件和额外代码的情况下执行该操作。
wget不一定是最好的工具。我建议使用curl来代替。

所以这里有一个更好的方式来处理这个问题......

#!/bin/bash 

sitelist="sites.list" 
curl="/usr/bin/curl" 

# Some errors, for good measure... 
if [[ ! -f "$sitelist" ]]; then 
    echo "ERROR: Sitelist is missing." >&2 
    exit 1 
elif [[ ! -s "$sitelist" ]]; then 
    echo "ERROR: Sitelist is empty." >&2 
    exit 1 
elif [[ ! -x "$curl" ]]; then 
    echo "ERROR: I can't work under these conditions." >&2 
    exit 1 
fi 

# Allow more advanced pattern matching (for case..esac below) 
shopt -s globstar 

while read url; do 

    # remove comments 
    url=${url%%#*} 

    # skip empty lines 
    if [[ -z "$url" ]]; then 
    continue 
    fi 

    # Handle just ftp, http and https. 
    # We could do full URL pattern matching, but meh. 
    case "$url" in 
    @(f|ht)tp?(s)://*) 
     # Get just the numeric HTTP response code 
     http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null) 
     case "$http_code" in 
     200|226) 
      # You'll get a 226 in ${http_code} from a valid FTP URL. 
      # If all you really care about is that the response is in the 200's, 
      # you could match against "2??" instead. 
      echo "$url" >> ok.txt 
      ;; 
     *) 
      # You might want different handling for redirects (301/302). 
      echo "$url" >> notok.txt 
      ;; 
     esac 
     ;; 
    *) 
     # If we're here, we didn't get a URL we could read. 
     echo "WARNING: invalid url: $url" >&2 
     ;; 
    esac 

done < "$sitelist"

这是未经测试。仅用于教育目的。可能含有坚果。

来源

2012-10-24 03:03:16 ghoti

+1美好的教学努力 –

令人惊叹的是，这真的很有帮助！感谢ghoti。 –

使用Wget的Shell脚本 - 如果其他嵌套在循环中

回答

相关问题