2014-09-04 90 views
1

我使用YouTube的谷歌V3 API:获取所有YouTube视频(有些视频丢失)

$url = 'https://www.googleapis.com/youtube/v3/search?part=id&channelId=' . $channelID . '&maxResults=50&order=date&key=' . $API_key; 

我已经成立了一个脚本,应该给我从给定的所有视频频道ID。对于某些频道,我获得所有视频,有些视频丢失了(与直接在YouTube上显示的视频数量相比),而对于更大的频道,我获得最大视频。尽管还有更多的488个视频的结果。

pageToken是一件很奇怪的事情。例如,一个频道有955个视频。我有18页,每页50个项目(这将是900个视频)。其中一些是播放列表,但如果我减去23个播放列表,我仍然有877个视频。如果我删除重复项,我只有488个结果! JSON输出中的totalResults向我显示了975个结果!?

这是我的递归函数:

function fetchAllVideos($parsed_json){ 
    $foundIds = array(); 
    if($parsed_json != ''){ 
     $foundIds = getVideoIds($parsed_json); 
     $nextPageToken = getNextPageToken($parsed_json); 
     $prevPageToken = getPrevPageToken($parsed_json); 

     if($nextPageToken != ''){ 
      $new_parsed_json = getNextPage($nextPageToken); 
      $foundIds = array_merge($foundIds, fetchAllVideos($new_parsed_json)); 
     } 
     if($prevPageToken != ''){ 
      $new_parsed_json = getNextPage($prevPageToken); 
      $foundIds = array_merge($foundIds, fetchAllVideos($new_parsed_json)); 
     } 
    } 

    return $foundIds; 
} 

$videoIds = fetchAllVideos($parsed_json);$parsed_json调用它是从我获取第一URL的结果。你能在这里看到一个错误吗?

是否有人知道视频数量是如何计算的,它们直接显示在YouTube上?有没有人设法获得与Youtube中的号码相对应的完整列表?

回答

2

https://gdata.youtube.com/feeds/api/users/USERNAME_HERE/uploads?max-results=50&alt=json&start-index=1没有办法。这是一个JSON提要,你必须循环,直到你得到少于50个结果。

编辑:

这应该是我使用的脚本:

ini_set('max_execution_time', 900); 

function getVideos($channel){ 
    $ids = array(); 
    $start_index = 1; 
    $still_have_results = true; 

    if($channel == ""){ 
     return false; 
    } 

    $url = 'https://gdata.youtube.com/feeds/api/users/' . $channel . '/uploads?max-results=50&alt=json&start-index=' . $start_index; 
    $json = file_get_contents($url); 
    $obj = json_decode($json); 

    while($still_have_results){ 
     foreach($obj->feed->entry as $video){ 
      $video_url = $video->id->{'$t'}; 
      $last_pos = strrpos($video_url, '/'); 
      $video_id = substr($video_url, $last_pos+1, strlen($video_url) - $last_pos); 
      array_push($ids, $video_id); 
     } 
     $number_of_items = count($obj->feed->entry); 
     $start_index += count($obj->feed->entry); 
     if($number_of_items < 50) { 
      $still_have_results = false; 
     } 

     $url = 'https://gdata.youtube.com/feeds/api/users/' . $channel . '/uploads?max-results=50&alt=json&start-index=' . $start_index; 
     $json = file_get_contents($url); 
     $obj = json_decode($json); 
    } 

    return $ids;  
} 

$videoIds = getVideos('youtube'); 
echo '<pre>'; 
print_r($videoIds); 
echo '</pre>'; 

现在,我做了一个试验,但我没有收集到的视频100%。尽管如此,我想出了最好的选择。

+0

我给你upvotes,但你应该张贴最后的反正。你不知道什么时候对某人有用。 – Random 2015-02-24 20:56:31

+0

@Random:现在我添加了我使用的脚本。 – testing 2015-02-24 22:10:43

1

此脚本一次选择60天,并检索结果,然后将其添加到现有数据数组中。通过这样做,对允许多少个视频没有任何限制,但可能需要一些时间才能通过几千个视频来拖拽更大的YouTube频道。确保你设置了API_KEY,时区,用户名,开始日期(应该在频道上的第一个视频之前开始)和句点(默认设置为60 * 60 * 24 * 60,这是60秒,这将需要如果视频的频率在60天内高于约50,则会降低)(5184000秒)。

*所有这些都在脚本中进行了评论。

date_default_timezone_set("TIMEZONE"); 

//youtube api key 
$API_KEY = "YOUR API KEY"; 

function search($searchTerm,$url){ 
    $url = $url . urlencode($searchTerm); 

    $result = file_get_contents($url); 

    if($result !== false){ 
     return json_decode($result, true); 
    } 

    return false; 
} 

function get_user_channel_id($user){ 
    global $API_KEY; 
    $url = 'https://www.googleapis.com/youtube/v3/channels?key=' . $API_KEY . '&part=id&forUsername='; 
    return search($user,$url)['items'][0]['id']; 
} 

function push_data($searchResults){ 
    global $data; 
    foreach($searchResults['items'] as $item){ 
     $data[] = $item; 
    } 
    return $data; 
} 

function get_url_for_utc_period($channelId, $utc){ 
    //get the API_KEY 
    global $API_KEY; 
    //youtube specifies the DateTime to be formatted as RFC 3339 formatted date-time value (1970-01-01T00:00:00Z) 
    $publishedAfter = date("Y-m-d\TH:i:sP",strval($utc)); 
    //within a 60 day period 
    $publishedBefore_ = $utc + (60 * 60 * 24 * 60); 
    $publishedBefore = date("Y-m-d\TH:i:sP",$publishedBefore_); 
    //develop the URL with the API_KEY, channelId, and the time period specified by publishedBefore & publishedAfter 
    $url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&type=video&key=' . $API_KEY . '&maxResults=50&channelId=' . $channelId . '&publishedAfter=' . urlencode($publishedAfter) . '&publishedBefore=' . urlencode($publishedBefore); 

    return array("url"=>$url,"utc"=>$publishedBefore_); 
} 
//the date that the loop will begin with, have this just before the first videos on the channel. 
//this is just an example date 
$start_date = "2013-1-1"; 
$utc = strtotime($start_date); 
$username = "CHANNEL USERNAME NOT CHANNEL ID"; 
//get the channel id for the username 
$channelId = get_user_channel_id($username); 

while($utc < time()){ 
    $url_utc = get_url_for_utc_period($channelId, $utc); 
    $searchResults = search("", $url_utc['url']); 
    $data = push_data($searchResults); 
    $utc += 60 * 60 * 24 * 60; 
} 
print "<pre>"; 
print_r($data); 
print "</pre>"; 

//check that all of the videos have been accounted for (cross-reference this with what it says on their youtube channel) 
print count($data);