2013-02-28 121 views
0

这应该是一个简单的函数(它计算字符串中的唯一字符数),但是我遇到了一个奇怪的问题。请注意,我的代码使用期望只有ASCII字母a-z和A-Z。写入char *数组的字符变异

int unique_chars(char* my_str) { 
//printf("starting unique_chars\n"); 
    char seen_buffer[52]; // max 52 letters a-z & A-Z 
    int seen_count = 1; // not ever expecting my_str to be NULL 
    int i, j; 
    char next; 
//printf("first char is %c\n", my_str[0]); 
    seen_buffer[0] = my_str[0]; // first char must be unique 

    for (i=1; i<strlen(my_str); i++) { // walk along the rest of my_str 
    next = my_str[i]; 

    if (next >= 97) { 
     next = next - 32; // the next char will always be capital, for convenience 
    } 

    for (j=0; j<seen_count; j++) { // compare next to all the unique chars seen before 
//printf("current char is %c, checking against %c\n", next, seen_buffer[j]); 
     if ((next==seen_buffer[j]) || (next+32==seen_buffer[j])) { 
//printf("breaking\n"); 
     break; // jump to the next char in my_str if we find a match 
     } 
     if (j==seen_count-1) { // at this point, we're sure that next hasn't been seen yet 
//printf("new unique char is %c\n", next); 
     seen_count++; 
     seen_buffer[seen_count] = next; 
//printf("new char val is %c, should be %c\n", seen_buffer[seen_count], next); 
     break; 
     } 
    } 
    } 
    return seen_count; 
} 

int main(int argc, char* argv[]){ 
    char* to_encode = argv[1]; 
    printf("unique chars: %d\n", unique_chars(to_encode)); 
} 

当我用某些字符串调用时,我得到不正确的结果。例如,尝试:

./a.out gghhiijj 

这将产生(和printf的取消注释):

starting unique_chars 
first char is g 
current char is G, checking against g 
breaking 
current char is H, checking against g 
new unique char is H 
new char val is H, should be H 
current char is H, checking against g 
current char is H, checking against 
new unique char is H 
new char val is H, should be H 
current char is I, checking against g 
current char is I, checking against 
current char is I, checking against H 
new unique char is I 
new char val is I, should be I 
current char is I, checking against g 
current char is I, checking against 
current char is I, checking against H 
current char is I, checking against H 
new unique char is I 
new char val is I, should be I 
current char is J, checking against g 
current char is J, checking against 
current char is J, checking against H 
current char is J, checking against H 
current char is J, checking against I 
new unique char is J 
new char val is J, should be J 
current char is J, checking against g 
current char is J, checking against 
current char is J, checking against H 
current char is J, checking against H 
current char is J, checking against I 
current char is J, checking against I 
new unique char is J 
new char val is J, should be J 

所以,我不断收到在我seen_buffer重复,因为一些空白字符存储,而不是字母字符存在应在那里!然而,当我在写入到seen_buffer后进行比较(即新的字符值是%c,应该是%c \ n)时,显示正确的字符!

任何帮助表示赞赏!

+0

'if(next> = 97){'// EBCDIC字符集中'a'的值是什么?研究它。 C代码的重点是什么,如果不是可移植的?研究C的历史。你为什么不用97代替'a'? – Sebivor 2013-02-28 05:11:34

+0

假设你想检查一个字符是否为小写:'if(islower((unsigned char)next)){...}',现在假设你想把这个小写char转换为大写char:'next = islower (无符号字符)下一个)? toupper((unsigned char)next):next;'让你的编译器为你做优化,因为它足够聪明地执行死代码消除和尾部调用优化。 – Sebivor 2013-02-28 05:15:31

+0

感谢您的建议!不知道isLower是否存在 - 非常有帮助! – David 2013-02-28 05:21:03

回答

1

这里,您有一个差一错误:

seen_count++; 
    seen_buffer[seen_count] = next; 

第一个char进去seen_buffer[0]seen_count设置为1,这意味着,未来字符进去seen_buffer[2]seen_count增加到2。什么事也没进去seen_buffer[1](这是空白字符后您继续在你的printfs中看到),当检查一个字符与seen_buffer时,你不会检查你刚才输入的最后一个字符。

交换这些行,它应该工作。

+0

问题已解决;万分感谢! – David 2013-02-28 04:45:31

0

你是不是检查,如果输入的值是到z

A到Z和之间还有你目前的代码可以在焦炭seen_buffer缓冲区溢出[52]所以加边界的代码检查

我觉得你的代码可以更简单

下面是一个简单的算法为您

unsigned int returnUniqueChar (const char *input) 
{ 
    int count[52] = {0}; // initialize all the memory with zero 
    int unique = 0; 

    while (*input != '\0') 
    { 
      if ((input >= 'A' && input <= 'Z') 
      { 
       count[input -'A']++; 
      } 

      else if (input >= 'a' && input <= 'z') 
      { 
       count[input -'a'+ 26]++; 
      } 
      input++; 
    } 

    for (int i = 0; i < 56 && (count[i] == 1) ; i++) 
      unique++; 

    return unique; 

} 
+0

谢谢!在我的问题的顶部,我实际上提到,我只希望包含字符a-z和A-Z的输入(需要一段时间来解释原因,但这是我唯一需要担心的用例),因此验证被忽略。另外,由于这个原因,不应该担心缓冲区溢出问题:因为从a,...,z,A,...,Z有52个字符,包含该字母表的任何字符串都会给出最大数量的seen_buffer(和保证适合缓冲区)。 我的确喜欢你的算法,但我仍然对我的问题感到好奇。感谢您的解决方法! – David 2013-02-28 04:10:52

+0

嗯,我总是觉得api一定很难打破。在你的情况下,人们可能不会遵守限制并最终传递无效字符。一个简单的例子就是传递从stdinput中读取的输入。在这些情况下,输入将包含\ n或\ r并且它会破坏您的代码 – Pradheep 2013-02-28 04:17:43

+0

您可以在代码中添加注释以向我们解释您正在尝试执行的操作每一行。它对我来说并不是直截了当的。 – Pradheep 2013-02-28 04:18:31