我正在写一个方法,为其他两个系统之间的接口创建一个固定长度的消息。有什么更好的方法来切割包括Java中的2字节字符的字符串
消息必须按约定的长度(字节)传送每个项目,但如果长度大于约定的长度,消息应按项目的长度截断。
该消息包含2个字节的字符,所以如果它在字符中间截断,它将被截断为断开状态。
为了计算正确的字节,它会搜索从头开始剪切的长度。如果消息很长,则性能应该很差。
我找不到更好的方法,所以我在这里寻求帮助。我很抱歉代码复杂且冗余。整个项目可用here。
package thecodinglog.string;
public class StringHelper {
public static String substrb2(String str, Number beginByte) {
return substrb2(str, beginByte, null, null, null);
}
public static String substrb2(String str, Number beginByte, Number byteLength) {
return substrb2(str, beginByte, byteLength, null, null);
}
/**
* Returns the substring of the String.
* It returns a string as specified length and byte position.
* You can pad characters left or right when there is a specified length.
* It distinguishes between 1 byte character and 2 byte character and returns it exactly as specified byte length.
* If the start position or the specified length causes a 2-byte character to be truncated in the middle,
* it will be converted to Space.
* You can specify either left or right padding.
*
* If beginByte is 0, it is changed to 1 and processed.
* If beginByte is less than 0, the string is searched for from right to left.
* If beginByte or byteLength is a real number, the decimal point is discarded.
* If you do not specify a length, returns everything from the starting position to the right-end string.
*
* Examples:
* <blockquote><pre>
* StringHelper.substrb2("a好호b", 1, 10, null, "|") returns "a好호b||||"
* StringHelper.substrb2("ab한글", 4, 2) returns " "
* StringHelper.substrb2("한a글", -3, 2) returns "a "
* StringHelper.substrb2("abcde한글이han gul다ykd", 7) returns " 글이han gul다ykd"
* </pre></blockquote>
*
* @param str a string to substring
* @param beginByte the beginning byte
* @param byteLength length of bytes
* @param leftPadding a character for padding. It must be 1 byte character.
* @param rightPadding a character for padding. It must be 1 byte character.
* @return a substring
*/
public static String substrb2(String str, Number beginByte, Number byteLength, String leftPadding, String rightPadding) {
if (str == null || str.equals("")) {
throw new IllegalArgumentException("The source string can not be an empty string or null.");
}
if (leftPadding != null && rightPadding != null) {
throw new IllegalArgumentException("Left padding, right padding Either of two must be null.");
}
if (leftPadding != null) {
if (leftPadding.length() != 1) {
throw new IllegalArgumentException("The length of the padding string must be one.");
}
if (getByteLengthOfChar(leftPadding.charAt(0)) != 1) {
throw new IllegalArgumentException("The padding string must be 1 Byte character.");
}
}
if (rightPadding != null) {
if (rightPadding.length() != 1) {
throw new IllegalArgumentException("The length of the padding string must be one.");
}
if (getByteLengthOfChar(rightPadding.charAt(0)) != 1) {
throw new IllegalArgumentException("The padding string must be 1 Byte character.");
}
}
int beginPosition = beginByte.intValue();
if (beginPosition == 0) beginPosition = 1;
int length;
if (byteLength != null) {
length = byteLength.intValue();
if (length < 0) {
return null;
}
} else {
length = -1;
}
if (length == 0)
return null;
boolean beginHalf = false;
int accByte = 0;
int startIndex = -1;
if (beginPosition >= 0) {
for (int i = 0; i < str.length(); i++) {
if (beginPosition - 1 == accByte) {
startIndex = i;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
} else if (beginPosition == accByte) {
beginHalf = true;
startIndex = i;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
} else if (accByte + 2 == beginPosition && i == str.length() - 1) {
beginHalf = true;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
}
accByte = accByte + getByteLengthOfChar(str.charAt(i));
}
} else {
beginPosition = beginPosition * -1;
if(length > beginPosition){
length = beginPosition;
}
for (int i = str.length() - 1; i >= 0; i--) {
accByte = accByte + getByteLengthOfChar(str.charAt(i));
if (i == str.length() - 1) {
if (getByteLengthOfChar(str.charAt(i)) == 1) {
if (beginPosition == accByte) {
startIndex = i;
break;
}
} else {
if (beginPosition == accByte) {
if (length > 1) {
startIndex = i;
break;
} else {
beginHalf = true;
break;
}
}else if(beginPosition == accByte - 1){
if(length == 1){
beginHalf = true;
break;
}
}
}
} else {
if (getByteLengthOfChar(str.charAt(i)) == 1) {
if (beginPosition == accByte) {
startIndex = i;
break;
}
} else {
if (beginPosition == accByte) {
if (length > 1) {
startIndex = i;
break;
} else {
beginHalf = true;
break;
}
} else if(beginPosition == accByte - 1) {
if(length > 1){
startIndex = i + 1;
}
beginHalf = true;
break;
}
}
}
}
}
if (accByte < beginPosition) {
throw new IndexOutOfBoundsException("The start position is larger than the length of the original string.");
}
StringBuilder stringBuilder = new StringBuilder();
int accSubstrLength = 0;
if (beginHalf) {
stringBuilder.append(" ");
accSubstrLength++;
}
if (byteLength == null) {
stringBuilder.append(str.substring(startIndex));
return new String(stringBuilder);
}
for (int i = startIndex; i < str.length() && startIndex >= 0; i++) {
accSubstrLength = accSubstrLength + getByteLengthOfChar(str.charAt(i));
if (accSubstrLength == length) {
stringBuilder.append(str.charAt(i));
break;
} else if (accSubstrLength - 1 == length) {
stringBuilder.append(" ");
break;
} else if (accSubstrLength - 1 > length) {
break;
}
stringBuilder.append(str.charAt(i));
}
if (leftPadding != null) {
int diffLength = byteLength.intValue() - accSubstrLength;
StringBuilder padding = new StringBuilder();
for (int i = 0; i < diffLength; i++) {
padding.append(leftPadding);
}
stringBuilder.insert(0, padding);
}
if (rightPadding != null) {
int diffLength = byteLength.intValue() - accSubstrLength;
StringBuilder padding = new StringBuilder();
for (int i = 0; i < diffLength; i++) {
padding.append(rightPadding);
}
stringBuilder.append(padding);
}
return new String(stringBuilder);
}
private static int getByteLengthOfChar(char c) {
if ((int) c < 128) {
return 1;
} else {
return 2;
}
}
}
新尝试代码
String testData = "한글이가득";
Charset charset = Charset.forName("EUC-KR");
ByteBuffer byteBuffer = charset.encode(testData);
byte[] newone = Arrays.copyOfRange(byteBuffer.array(), 1, 5);
CharsetDecoder charsetDecoder = charset.newDecoder()
.replaceWith(" ")
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE);
CharBuffer charBuffer = charsetDecoder.decode(ByteBuffer.wrap(newone));
System.out.println(charBuffer.toString());
我的预期 “글”,而是 “畸邦”。 我认为开始索引必须是正确的解码位置,但我不认为有可能让该方法知道我想要的。
添加例如失败
index| 0 1 2 3 4 5 6 7 8 9
Char | 한 | 글 | 이 | 가 | 득
---- | ---- | ---- | ---- | ---- | ----
hex | c7d1 | b1db | c0cc | b0a1 | b5e6
---- | ---- | ---- | ---- | ---- | ----
假设的起始索引为1和长度为4个字节,分十六进制码会是这样
index| 0 1 2 3 4 5 6 7 8 9
Char | 한 | 글 | 이 | 가 | 득
---- | ---- | ---- | ---- | ---- | ----
hex | c7d1 | b1db | c0cc | b0a1 | b5e6
---- | ---- | ---- | ---- | ---- | ----
sub | d1 | b1db | c0
当解码器解码d1b1dbc0,它将d1b1作为一个字符并视为dbc0作为一个字符。这可能会因字符集而异,但在这种情况下,它会发生类似的变化。除非解码器知道原始字符的字节集合,否则解码器将用错误的字符解码它,因为字节不知道起始点。
我认为这种方法的关键是如何让解码器知道原始字符的起始位置(以字节为单位)。
你知道,char是在java中的两个字节? – Rodney
这是很多要求人们通过的代码...请参阅如何创建一个[mcve] –
您的整个问题可以改为“找到字符表示在给定下的字符表达式的最长截断长度 ?”如果是这样,我会使用'CharsetEncoder',通过'char'追加到'char',然后等待直到结果溢出(或者更好,参见'encodeLoop'方法) – GPI