2012-10-19 43 views
-1

我有一个大问题。我在一次灾难性的追溯中运行一次验证(http://www.regular-expressions.info/catastrophic.html)。但我很难搞清楚为什么。也许有人有一个想法?除此之外,正则表达式在所有用例中都可以很好地工作。大写正则表达式的Stackoverflow

regex: "^((^|[^A-Za-z]+)[A-Z][A-Za-z]*)*[^A-Za-z]*$" 

问题的投入:

"Disposable 
BHT, 
Tocopheryl AcetateHydrating Shave Gel 
Aqua, 
Glycerin, 
Palmitic Acid, 
Triethanolamine, 
Isopentane, 
Glyceryl Oleate, 
Stearic Acid, 
Isobutane, 
Sorbitol, 
Parfum, 
Hydroxyethylcellulose, 
Myristic Acid, 
PEG-90M, 
Butyrospermum Parkii Butter Extract, 
Lauric Acid, 
PTFE, 
PEG-23M, 
Propylene Glycol, 
Glyceryl Acrylate/Acrylic Acid Copolymer, 
PVM/MA Copolymer, 
Silica, 
Methylparaben, 
Propylparaben, 
BHT, 
Limonene, 
Benzyl Salicylate, 
Linalool, 
CI 42053, 
CI 42090 
Series Thermal Face Scrub 
PEG-4, 
Magnesium Sulfate, 
PEG/PPG-300/55 Copolymer, 
Polyethylene, 
Polypropylene, 
Laureth-23, 
Stearyl Alcohol, 
Dioleoylethyl Hydroxyethylmonium Methosulfate, 
Cetyl Alcohol, 
Behentrimonium Chloride, 
Distearyldimonium Chloride, 
Hydroxypropylcellulose, 
Parfum, 
Methylparaben, 
Propylparaben, 
Niacinamide, 
Alcohol Denat, 
Hexylene Glycol, 
Benzyl Salicylate, 
AquaClassic Clean Shampoo 
Aqua, 
Sodium Lauryl Sulfate, 
Sodium Laureth Sulfate, 
Glycol Distearate, 
Zinc Carbonate, 
Sodium Chloride, 
Sodium Xylenesulfonate, 
Zinc Pyrithione, 
Cocamidopropyl Betaine, 
Dimethicone, 
Sodium Benzoate, 
Guar Hydroxypropyltrimonium Chloride, 
Hydrochloric Acid, 
Hexyl Cinnamal, 
Linalool, 
Butylphenyl Methylpropional, 
Magnesium Carbonate Hydroxide, 
Ammonium Laureth Sulfate, 
Magnesium Nitrate, 
Sodium Polynaphthalenesulfonate, 
Methylchloroisothiazolinone, 
Magnesium Chloride, 
CI 42090, 
Citric Acid, 
Methylisothiazolinone, 
Tetrasodium EDTA, 
CI 17200, 
DMDM Hydantoin Perspirant Deodorant Spray Sport Protect 48H 
Butane, 
Isobutane, 
Cyclopentasiloxane, 
Aluminum Chlorohydrate, 
Cyclodextrin, 
Disteardimonium Hectorite, 
Dimethicone, 
Aqua, 
Triethyl Citrate, 
Alpha-Isomethyl Ionone, 
Butylphenyl Methylpropional, 
Citral, 
Citronellol, 
Coumarin, 
Geraniol, 
Limonene, 
Linalool 
Pillite Series Instant Hydration Moisturiser +SPF 15 
Aqua, 
Glycerin, 
Ethylhexyl Salicylate, 
Niacinamide, 
Butyl Methoxydibenzoylmethane, 
Dimethicone, 
Polyethylene, 
Octocrylene, 
Isopropyl Palmitate, 
Phenylbenzimidazole Sulfonic Acid, 
Sorbitan Stearate, 
Triethanolamine, 
Cetyl Alcohol, 
Sodium Acrylates Copolymer, 
Aluminum Starch Octenylsuccinate, 
Stearyl Alcohol, 
Caprylic/Capric Triglyceride, 
Panthenol, 
Benzyl Alcohol, 
Dimethiconol, 
Fragrance, 
Ethylparaben, 
Cetearyl Glucoside, 
Cetearyl Alcohol, 
PEG 100 Stearate, 
Propylparaben, 
Disodium EDTA, 
C12-13 Pareth-3, 
Palmitic Acid, 
Stearic Acid, 
Benzyl Salicylate, 
Laureth-7, 
Linalool, 
Butylphenyl Methylpropional, 
Myristic Acid, 
Coumarin, 
Heptadecanoic Acid, 
Benzyl Benzoate" 

谢谢!

+0

输入,当然不包括引号! – user1635689

+1

你如何使用正则表达式?预期的结果是什么? – Keppil

+4

你能解释你想用这个正则表达式做什么? – ninaj

回答

3

的问题是你有一个条款,那就是在形式

(something*)* 

此工作正常时,正则表达式匹配正确的,但事情灾难性的错的,如果您的一条线路是畸形的。这是由于回溯和正则表达式引擎会尝试的所有各种组合。

在你的最长线路的情况下:

吉列系列即时保湿保湿+防晒指数15

如果该行不符合您的正则表达式,那么它会采取正则表达式引擎2,251,799,813,685,248( 2^51)尝试之前,它会意识到这一行不符合正则表达式。

该修复程序位于您链接到的页面上。既然你正在寻找一个交替的单词序列,然后不是单词,那么回溯对你来说是没用的(因为一个单词不能被分成一个单词/非单词/单词的序列)。你可以通过使用所有格量词来防止回溯(即一旦正则表达式匹配一个单词或非单词,它不会放弃那个匹配)。

要使用所有格量词只是一个加来结束所有的量词,所以

(something*)*成为(something*+)*+

+0

+1,很好的解释。 – dan1111

+0

真棒你摇滚很多。 1+的解释和提议的改变实际上是这样做的。 – user1635689