2013-10-01 183 views
0

我有2个字段的数据库表:指数(INT),电子邮件(VARCHAR(100))复杂LINQ查询

我需要做到以下几点:

  1. 集团所有电子邮件域名(所有电子邮件已经小写)。
  2. 选择从所有组中的所有电子邮件,其中电子邮件域不超过总邮件的20%步骤之前1.

代码示例的总和:

DataContext db = new DataContext(); 

    //Domains to group by 
    List<string> domains = new List<string>() { "gmail.com", "yahoo.com", "hotmail.com" }; 

    Dictionary<string, List<string>> emailGroups = new Dictionary<string, List<string>>(); 

    //Init dictionary 
    foreach (string thisDomain in domains) 
    { 
     emailGroups.Add(thisDomain, new List<string>()); 
    } 

    //Get distinct emails 
    var emails = db.Clients.Select(x => x.Email).Distinct(); 

    //Total emails 
    int totalEmails = emails.Count(); 

    //One percent of total emails 
    int onePercent = totalEmails/100; 

    //Run on each email 
    foreach (var thisEmail in emails) 
    { 
     //Run on each domain 
     foreach (string thisDomain in emailGroups.Keys) 
     { 
      //If email from this domain 
      if (thisEmail.Contains(thisDomain)) 
      { 
       //Add to dictionary 
       emailGroups[thisDomain].Add(thisEmail); 
      } 
     } 
    } 

    //Will store the final result 
    List<string> finalEmails = new List<string>(); 

    //Run on each domain 
    foreach (string thisDomain in emailGroups.Keys) 
    { 
     //Get percent of emails in group 
     int thisDomainPercents = emailGroups[thisDomain].Count/onePercent; 

     //More than 20% 
     if (thisDomainPercents > 20) 
     { 
      //Take only 20% and join to the final result 
      finalEmails = finalEmails.Union(emailGroups[thisDomain].Take(20 * onePercent)).ToList(); 
     } 
     else 
     { 
      //Join all to the final result 
      finalEmails = finalEmails.Union(emailGroups[thisDomain]).ToList(); 
     } 
    } 

有谁知道一个更好的办法做了?

+1

Linq to * what *? –

+0

看起来像你只是想以某种方式过滤所有的结果,分组只是实现这一目标的一个跳板?顺便说一句,你能否更清楚地说明为什么'101,102'而不是'100,101',对于'104,105'而不是'103,104'是一样的?收集项目从下到上? –

+0

如果总数超过总数或者您想要包含所有达到阈值的电子邮件,您是否可以确认是否要完全排除域? – James

回答

2

我想不出这样做没有击中DB至少两次,一次用于分组和一个用于整体计数的方式,你可以尝试像

var query = from u in db.Users 
      group u by u.Email.Split('@')[1] into g 
      select new 
      { 
       Domain = g.Key, 
       Users = g.ToList() 
      }; 

query = query.Where(x => x.Users.Count <= (db.Users.Count() * 0.2)); 
+0

可能通过添加let userCount = db.Users.Count()* 0.2你可以做到这一点,而不需要两次敲击db –

+0

这将过滤出'100,101,102',而OP想要保持'101,102'? –

+0

@allo_man一个'let'会给我的门槛,但我不太清楚我怎样才能将它应用到组中。 KingKing我对这里的要求并不完全清楚,但从我所收集的OP中想要只返回分组,如果所有特定于域*的电子邮件总数低于总数的20%。 – James

0
var maxCount = db.Users.Count() * 0.2; 
var query = (from u in db.Users 
     group u by u.Email.Split('@')[1] into g 
     select new 
     { 
      Domain = g.Key, 
      Users = g.Take(maxCount).ToList() 
     }) 
     .SelectMany(x => x.Users); 
1

假设你想获得各组按升序最后一个项目:

int m = (int) (input.Count() * 0.2); 
var result = input.GroupBy(x=>x.email.Split('@')[1], 
          (key,g)=>g.OrderByDescending(x=>x.index).Take(m) 
            .OrderBy(x=>x.index)) 
        .SelectMany(g=>g);//If you want to get the last result without grouping 

或者这样:

var result = input.GroupBy(x=>x.email.Split('@')[1], 
          (key,g)=>g.OrderBy(x=>x.index) 
            .Skip(g.Count()-m)) 
        .SelectMany(g=>g);//If you want to get the last result without grouping