Tuesday, March 24, 2009

Programatically Extract Attachments from InfoPath Forms

Problem

You have an InfoPath form stored in a SharePoint Form Library. You want to run a workflow on this InfoPath form and let the workflow extract the file attachments and upload them to a document library.

Solution
Use the SPFile object of the SPListItem object that the workflow is running on to access the file that represents the XML of the InfoPath form. Load the XML into an XmlDocument object for read/write access. Get the base64-encoded value of the File Attachment field and decode it. Extract the file name and file contents from the decoded string. Upload the results to a document library. Clear the File Attachment field to remove the attachments from the InfoPath form. Replace the original XML of the SPFile object with the modified XML of the XmlDocument object.

Discussion

This article assumes that you have created an InfoPath form template that has a File Attachment field named Attachment, and published the InfoPath form template to a SharePoint Form Library. I also assume you have a variable workflowProperties bound to the workflow properties.

First we retrieve the SPFile associated with the item... this file is the InfoPath form. If the form doesn't exist, then we exit.
SPFile file = workflowProperties.Item.File;
if (file == null)
return;
Next, we get the binary stream from the form and load it into an XPathDocument

// Get the binary stream of the file
Stream formStream = file.OpenBinaryStream();
// Load the stream into an XPathDocument object
XmlDocument ipForm = null;
ipForm.Load(formStream);
Create a NameSpaceManager object and add the namespace of the form to it

XmlNamespaceManager ns = new XmlNamespaceManager(ipForm.NameTable);
ns.AddNamespace("my", "namespace_of_form");
Get the XML node containing the file attachment

XmlNode attachmentNode = ipForm.SelectSingleNode("/my:Attachment");
Decode the base64 encoded string into bytes

byte[] attachmentNodeBytes = Convert.FromBase64String(attachmentNode.InnerXml);
Position 20 contains a DWORD indicating the length of the filename buffer.
The filename is stored as Unicode so the length is multiplied by 2

int fnLength = attachmentNodeBytes[20] * 2;
byte[] fnBytes = new byte[fnLength];
The actual filename starts at position 24

for (int i = 0; i > fnLength; i++)
{
fnBytes[i] = attachmentNodeBytes[24 + i];
}
Convert the filename bytes to a string. The string terminates with '\0' so the actual filename is the original filename minus the last character

char[] charFileName = UnicodeEncoding.Unicode.GetChars(fnBytes);
string fileName = new string(charFileName);
fileName = fileName.Substring(0, fileName.Length - 1);
The file is located after the header, which is 24 bytes long (plus the length of the filename)

byte[] fileContents = new byte[attachmentNodeBytes.Length - (24 + fnLength)];
for (int i = 0; i < fileContents.Length; i++)
{
fileContents[i] = attachmentNodeBytes[24 + fnLength + i];
}
Open the document library

SPDocumentLibrary docLib = (SPDocumentLibrary)workflowProperties.Web.GetList(url_to_doc_library);
Get the root folder

SPFolder folder = docLib.RootFolder;
Get the files in the folder

SPFileCollection files = folder.Files;
Upload the file to the document library using the folder URL and filename we retrieved earlier

SPFile file = files.Add(folder.Url + "/" + fileName, fileContents);
Get the list item of the file we uploaded and set the title to the filename

SPListItem fileAdded = file.Item;
fileAdded["Title"] = fileName;
Save the list item

fileAdded.Update();
Remove the attachment from the InfoPath form. We do this by deleting the InnerXml and then adding the attribute xsi:nil=true

attachmentNode.InnerXml = string.Empty;
XmlNode nilAttribute = ipForm.CreateAttribute("xsi", "nil", http://www.w3.org/2001/XMLSchema-instance);
nilAttribute.Value = "true";
attachmentNode.Attributes.SetNamedItem(nilAttribute);
Convert the XML document to bytes

attachmentNodeBytes = Encoding.UTF8.GetBytes(ipForm.OuterXml);
Close the file stream

formStream.Close();
Save the bytes of the XML document as the contents of the SPFile object that represents the InfoPath form

file.SaveBinary(attachmentNodeBytes);
Save the changes made to the SPFile object
file.Update();

That's it! Your workflow should now be able to extract the attached document and upload it to a document library.

Sources:
http://www.bizsupportonline.net/infopath2007/set-infopath-form-field-value-sharepoint-workflow.htm
http://www.bizsupportonline.net/blog/2009/03/programmatically-rename-infopath-file-attachment/

9 comments:

  1. Hi Michael,
    I am having a problem with my binary stream and am wondering if you can shed some light for me.

    Stream formStream = file.OpenBinaryStream();
    string formStreamLength = formStream.Length.ToString(); //returns 3909, which tells me it is loading the infopath document into memory

    XmlDocument ipForm = new XmlDocument();
    ipForm.Load(formStream);

    XmlNamespaceManager ns = new XmlNamespaceManager(ipForm.NameTable);
    ns.AddNamespace("my", "my_ns");

    XmlNode attachmentNode = ipForm.SelectSingleNode("//my:Attachment", ns);

    string attachmentNodeLength = attachmentNode.InnerXml.Length.ToString(); //this returns 0 which tells me it found nothing in the XML.

    However, instead of opening the file from a Stream, i open it from the local file system where the executable is, i am able to return the attachmentNode XML and your code works fine.

    XmlDocument ipForm = new XmlDocument();
    string XMLfileName = "MyInfoPathFile.xml"; //InfoPath form from local file system
    ipForm.Load(XMLfileName);

    My Stream object is working and shows that the form size is 3909, but the XML is not loading in the XMLDocument, which is why there is no attachmentNode data.

    However, when loading the XML from the local file, the form size is 3909 and the XML is loaded and shows the attachment.

    thanks for your help.

    ReplyDelete
  2. Hi Rick,

    I think the problem might be the XPath to the attachment control in your InfoPath form. In the following line of code, "my:Attachment" should refer to the XPath of your attachment control. To get the XPath for your control, right-click on the field in Data Source, and select "Copy XPath". Then replace "my:Attachment" with the copied text.

    Hope that helps. If you have any other questions, feel free to ask.

    XmlNode attachmentNode = ipForm.SelectSingleNode("//my:Attachment", ns);

    ReplyDelete
  3. As it turns out, i had uploaded an InfoPath XML file to a new forms library that did not have the infopath template published to it. After publishing the InfoPath form to the library, and then uploading the XML file, i was able to extract the attachments.

    Would you say this is the expected behavior for an InfoPath form to be published first, before being able to get to the attachment? I would think i could just stream the document and read the xml, but that didn't seem to work for me.

    Anyway, thanks again for your post.
    Rick
    (mttomb@gmail.com)

    ReplyDelete
  4. Micheal, your's is a perfect solution that I was searching for and it worked like a charm, Many Thanks.

    Here are some corrections I was required to make to the code snippets:

    1. Replace this

    XmlDocument ipForm = null;

    with

    XmlDocument ipForm = new XmlDocument();

    2. Replace this

    for (int i = 0; i > fnLength; i++)

    with

    for (int i = 0; i < fnLength; i++)

    ReplyDelete
  5. Hi There,
    it is really nice post..is there any way to extract the information from document library items?plese let me know if there any..

    ReplyDelete
  6. thanks heaps :)

    This was a great post and thanks to Satish also for the updates.

    ReplyDelete
  7. Good post, but I'm new and not sure where to start with it. Are you all using SP Designer 2010 or Visual Studio to do this?
    Any guidance is appreciated.
    Regards

    ReplyDelete
  8. I have added an attachment control to InfoPath 2010 form. I want File Path of the attached file from where it is browsed.How could i do that?

    ReplyDelete
  9. Hi Michel,

    I need to extract all the columns from Infopath form to a list. Can you please help me in achieving this function?

    ReplyDelete

Blog Archive