Monday, March 12, 2012

Issue with SqlUserDefinedAggregate

I am using the code below but I am getting a "zero" result for
dbo.AggredIssue('Test') user defined aggregate everytime that the query
executes parallel processing and uses the "Merge" method. It seems that my
private variable "private List<string> myList" gets nullified everytime it
goes through the "Merge".
I saw other people reporting the same issue in other forums, but nobody was
able to provide a solution or explanation.
See below a simplified version of my code (posted just after the queries)
that replicates the issue.
The query below works because it does't process the query in parallel.
SELECT GroupID, dbo.AggregIssue('Test')
FROM MyTable
where fund = 2
group by GroupID
The query below doesn't work because it process the query in parallel.
SELECT GroupID, dbo.AggregIssue('Test')
FROM MyTable
where fund <= 20
group by GroupID
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
MaxByteSize = 1000)]
public class AggregIssue : IBinarySerialize {
private List<string> myList;
private int myResult;
public void Init() {
myList = new List<string>();
}
public void Accumulate(SqlString Value) {
if (Value.IsNull) { return; }
myList.Add(Value.ToString());
}
public void Merge(AggregIssue Other) {
if (Other.myList != null) {
if (myList == null) {
myList = Other.myList;
}
else {
myList.AddRange(Other.myList);
}
}
}
public SqlInt32 Terminate() {
return new SqlInt32(myResult);
}
public void Read(BinaryReader r) {
myResult = r.ReadInt32();
}
//The code below is simplified for posting in the forum.
//I do additional manipulation of the list and require
//the aggregation to be IBinarySerialize.
//But this code replicates the issue also
public void Write(BinaryWriter w) {
w.Write(myList.Count);
}
}"Fernando" <Fernando@.discussions.microsoft.com> wrote in message
news:60C66368-EFD5-4C6B-A9EF-FC363A91C1DB@.microsoft.com...
>I am using the code below but I am getting a "zero" result for
> dbo.AggredIssue('Test') user defined aggregate everytime that the query
> executes parallel processing and uses the "Merge" method. It seems that my
> private variable "private List<string> myList" gets nullified everytime it
> goes through the "Merge".
> I saw other people reporting the same issue in other forums, but nobody
> was
> able to provide a solution or explanation.
> See below a simplified version of my code (posted just after the queries)
> that replicates the issue.
> The query below works because it does't process the query in parallel.
> SELECT GroupID, dbo.AggregIssue('Test')
> FROM MyTable
> where fund = 2
> group by GroupID
>
> The query below doesn't work because it process the query in parallel.
>
Yikes! How on earth do you test the merge method of a CLR Aggregate?
Perhaps you could cook up an appropriate plan guide?
David|||Hello Fernando,
F> I am using the code below but I am getting a "zero" result for
F> dbo.AggredIssue('Test') user defined aggregate everytime that the
F> query executes parallel processing and uses the "Merge" method. It
F> seems that my private variable "private List<string> myList" gets
F> nullified everytime it goes through the "Merge".
I don't believe BinaryRead and BinaryWrite method isn't preserving your List
<T>
as you expect it does. Here's example that serializes such a between calls
to merge.
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections.Generic;
using System.Text;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)]
public class CSVBuilder : IBinarySerialize, INullable
{
List<string> _list = null;
public void Init()
{
_list = new List<string>();
}
public void Accumulate(SqlString Value)
{
_list.Add(Value.Value);
}
public void Merge(CSVBuilder Group)
{
_list.AddRange(Group._list);
}
public SqlString Terminate()
{
StringBuilder sb = new StringBuilder(8000);
foreach (string item in _list) {
sb.Append(", ");
sb.Append(item);
}
return new SqlString(sb.ToString().Substring(2));
}
void IBinarySerialize.Read(System.IO.BinaryReader r)
{
int size = r.ReadInt32();
BinaryFormatter f = new BinaryFormatter();
_list = (List<string> )(f.Deserialize(new MemoryStream(r.ReadBytes(size))));
}
void IBinarySerialize.Write(System.IO.BinaryWriter w)
{
BinaryFormatter f = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
f.Serialize(ms,_list);
Int32 size = (Int32)ms.Length;
w.Write(size);
w.Write(ms.ToArray(), 0, (int)size);
}
bool INullable.IsNull
{
get { return _list == null; }
}
}
It seems to work with this.
USE scratch
go
create table dbo.vs(v varchar(50));
insert into dbo.vs values ('Alpha')
insert into dbo.vs values ('Bravo')
insert into dbo.vs values ('Charlie')
insert into dbo.vs values ('Delta')
insert into dbo.vs values ('Echo')
insert into dbo.vs values ('Foxtrot')
insert into dbo.vs values ('Golf')
insert into dbo.vs values ('Hotel')
insert into dbo.vs values ('India')
insert into dbo.vs values ('Juliet')
insert into dbo.vs values ('Kilo')
insert into dbo.vs values ('Lima')
insert into dbo.vs values ('Mike')
insert into dbo.vs values ('November')
insert into dbo.vs values ('Oscar')
insert into dbo.vs values ('Papa')
insert into dbo.vs values ('Quebec')
insert into dbo.vs values ('Romeo')
insert into dbo.vs values ('Sierra')
insert into dbo.vs values ('Tango')
insert into dbo.vs values ('Uniform')
insert into dbo.vs values ('Victor')
insert into dbo.vs values ('Whiskey')
insert into dbo.vs values ('Yankee')
insert into dbo.vs values ('Zulu')
go
select dbo.csvbuilder(v) from dbo.vs
go
drop table dbo.vs
go
Returns:
Alpha, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliet,
Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Unifo
rm,
Victor, Whiskey, Yankee, Zulu
Thanks!
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/|||Hi Kent,
Thank you very much for your response. The reason I don't serialize the list
itself is because the list exceeds the limit of 8000 for maxbytesize.
I need the whole list to process the resolution of a non-linear equation, so
I cannot pre-aggregate values to save space before serializing. So what I am
trying to do is to solve the equation (using the required data from the list
)
just before serialization.
The weird thing is that it works if the code doesn't go through the merge
method.
If you can think of some other alternative...
Thank you very much,
Fernando|||"Fernando" <Fernando@.discussions.microsoft.com> wrote in message
news:06F1F640-4435-49F1-BB5E-B0A268648E7F@.microsoft.com...
> Hi Kent,
> Thank you very much for your response. The reason I don't serialize the
> list
> itself is because the list exceeds the limit of 8000 for maxbytesize.
> I need the whole list to process the resolution of a non-linear equation,
> so
> I cannot pre-aggregate values to save space before serializing. So what I
> am
> trying to do is to solve the equation (using the required data from the
> list)
> just before serialization.
> The weird thing is that it works if the code doesn't go through the merge
> method.
> If you can think of some other alternative...
>
As a workaround you can always prevent a parallel plan with a MAXDOP hint.
David|||David,
Thanks for your response. The option "MAXDOP 1" worked. It would be nicer if
parallelism is enabled, but for now it a good workaround.
Thanks again,
Fernando
"David Browne" wrote:

> "Fernando" <Fernando@.discussions.microsoft.com> wrote in message
> news:06F1F640-4435-49F1-BB5E-B0A268648E7F@.microsoft.com...
>
> As a workaround you can always prevent a parallel plan with a MAXDOP hint.
> David
>
>|||Hello Fernando,
F> The weird thing is that it works if the code doesn't go through the
F> merge method.
Right, I suspect the that the merge method if actually building instances
of the UDA on different CPUs and when has to marshal them together (merging
the threads if you will), that's when it calls the serialization stuff and
that's when you lose your values.
F> If you can think of some other alternative...
Don't use a UDA, use a procedure or function if possible.
Thanks!
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/|||Thanks Kent!

> Hello Fernando,
> F> The weird thing is that it works if the code doesn't go through the
> F> merge method.
> Right, I suspect the that the merge method if actually building instances
> of the UDA on different CPUs and when has to marshal them together (mergin
g
> the threads if you will), that's when it calls the serialization stuff and
> that's when you lose your values.
You are probably right, and that would be the reason why my code is failing.

> F> If you can think of some other alternative...
> Don't use a UDA, use a procedure or function if possible.
The solution is much simpler with UDA as the List used in the aggregation is
dynamically populated based on the items in the "Group By" of the query.
The workaround provided by David Browne (force non-parallelism) will work
for now.
Thank you very much for your help!
Fernando

>

No comments:

Post a Comment