计算ASCII和Unicode混合字符串中的字符数

strlen($username);

用户名可以携带ASCII，Unicode或两者。计算ASCII和Unicode混合字符串中的字符数

实施例：

Jam123（ASCII） - 6个字符
ابت（Unicode）的 - 3个字符，但strlen的返回6个字节为Unicode是每字符2个字节。
果酱ت（Unicode和ASCII） - 5个字符（3 ASCII和2 Unicode的，即使我只有一个Unicode字符）

用户名在所有情况下都不能超出25个字符，不应小于4个字符。

我的主要问题是混合Unicode和ASCII在一起时，我怎能算轨如此条件语句可以deicde用户名是否不超过25个且不少于4

if(strlen($username) <= 25 && !(strlen($username) < 4))

3在Unicode字符将被计为6个字节引起麻烦，因为它允许用户以具有3个Unicode字符的用户名当字符应该是4

编号最小总是会在ASCII

来源

2011-09-03 user311509

所有ASCII都是Unicode。并非所有的Unicode都是ASCII。 – tchrist

@tchrist所有的ASCII都是** UTF-8 **。并非所有的** UTF-8 **都是ASCII码。 Unicode既不是。 – deceze

@user这可能是你一个良好的阅读：什么每个程序员绝对，肯定需要知道编码和字符集进行工作，文字]（http://kunststube.net/encoding/） – deceze

使用mb_strlen()。它处理unicode字符。

例子：

mb_strlen("Jamت", "UTF-8"); // 4

来源

2011-09-03 21:41:12 arnaud576875

Surprinsly，它的工作！我之前尝试过这个解决方案，但没有工作......我想我有一个错字或... ...无论如何，小修复mb_strlen（）...谢谢 – user311509

您可以使用，你选择编码mb_strlen。

http://sandbox.phpcode.eu/g/3a144/1

<?php 
echo mb_strlen('ابت', 'UTF8'); // returns 3

来源

2011-09-03 21:42:26 genesis

函数来计算的UNICODE句子/串词：

function mb_count_words($string) 
{ 
    preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches); return count($matches[0]); 
}

或

function mb_count_words($string, $format = 0, $charlist = '[]') { 
    $string=trim($string); 
    if(empty($string)) 
     $words = array(); 
    else 
     $words = preg_split('~[^\p{L}\p{N}\']+~u',$string); 
    switch ($format) { 
     case 0: 
      return count($words); 
      break; 
     case 1: 
     case 2: 
      return $words; 
      break; 
     default: 
      return $words; 
      break; 
    } 
}

然后执行：

echo mb_count_words("chào buổi sáng");

来源

2013-12-24 15:20:58

计算ASCII和Unicode混合字符串中的字符数

回答

相关问题