Download Large Files from SharePoint Online

August 15, 2016December 11, 2016 Piyush K Singh23 Comments

In my previous post, I had explained how to upload large files up to 10 GB to SharePoint Online. This is as per the new revised upload limit. The upload operation, as we know, is actually incomplete without download. I mistakenly, assumed that the previous code for file download will work for the latest larger files as well. But, just like I had to rewrite the upload code to accommodate this new limit, file download also needed a complete makeover. I initially tried the following ways with no success.

OpenBinaryDirect

FileInformation fileInformation = File.OpenBinaryDirect(clientContext, serverRelativeUrl);

Error:: Stream was too long.

In this approach, I was trying to download a file, say of size 10GB, in a single .NET object. However, for a 64-bit managed application on a 64-bit Windows operating system, you can create an object of no more than 2GB.

Refer:: https://msdn.microsoft.com/en-us/library/ms241064(v=vs.100).aspx.

OpenBinaryStream


File oFile = web.GetFileByServerRelativeUrl(strServerRelativeURL);
clientContext.Load(oFile);
ClientResult<Stream> stream = oFile.OpenBinaryStream();
clientContext.ExecuteQuery();

Error:: Invalid MIME content-length header encountered on read.

I could see in the Visual Studio Diagnostic Tools window that, the former approach was failing after the download of around 1.5GB of data however, the OpenBinaryStream download was failing after the download of 800-900MB of data only! This is because, MemoryStream uses a byte[] internally. So, whenever it’s internal buffer cannot fill the data, it simply doubles up its size and the error gets thrown a lot earlier than the OpenBinaryDirect approach.

Refer:: http://stackoverflow.com/a/15597139.

Correct Approach

After a lot of digging, I figured out that the only way I could download such a huge file is by using the Remote Procedure Call (RPC). Here’s the complete code for downloading a large file from SharePoint Online.

string fullFilePath = String.Empty;

Uri targetSite = new Uri(ctx.Web.Url);

SharePointOnlineCredentials spCredentials = (SharePointOnlineCredentials)ctx.Credentials;
string authCookieValue = spCredentials.GetAuthenticationCookie(targetSite);

string requestUrl = ctx.Url + "/_vti_bin/_vti_aut/author.dll";
string method = Utility.GetEncodedString("get document:15.0.0.4455");
serviceName = Utility.GetEncodedString(ctx.Web.ServerRelativeUrl);
if(documentName.StartsWith("/"))
{
	documentName = documentName.Substring(1);
}
documentName = Utility.GetEncodedString(documentName);
string oldThemeHtml = "false";
string force = "true";
string getOption = "none";
string docVersion = String.Empty; //directly passed as empty
string timeOut = "0";
string expandWebPartPages = "true";

string rpcCallString = String.Format("method={0}&service%5fname={1}&document%5fname={2}&old%5ftheme%5fhtml={3}&force={4}&get%5foption={5}&doc%5fversion=&timeout={6}&expandWebPartPages={7}",
	method, serviceName, documentName, oldThemeHtml, force, getOption, timeOut, expandWebPartPages);

HttpWebRequest wReq = WebRequest.Create(requestUrl) as HttpWebRequest;
wReq.Method = "POST";
wReq.ContentType = "application/x-vermeer-urlencoded";
wReq.Headers["X-Vermeer-Content-Type"] = "application/x-vermeer-urlencoded";
wReq.UserAgent = "MSFrontPage/15.0";
wReq.UseDefaultCredentials = false;
wReq.Accept = "auth/sicily";
wReq.Headers["MIME-Version"] = "1.0";
wReq.Headers["X-FORMS_BASED_AUTH_ACCEPTED"] = "T";
wReq.Headers["Accept-encoding"] = "gzip, deflate";
wReq.Headers["Cache-Control"] = "no-cache";

wReq.CookieContainer = new CookieContainer();
wReq.CookieContainer.Add(
	new Cookie("SPOIDCRL",
		authCookieValue.TrimStart("SPOIDCRL=".ToCharArray()),
		String.Empty,
		targetSite.Authority));

wReq.KeepAlive = true;

//create unique dir for the download
DirectoryInfo tempFilePath = Directory.CreateDirectory(Path.Combine(tempFileLoc, Guid.NewGuid().ToString()));

using (Stream requestStream = wReq.GetRequestStream())
{
	byte[] rpcHeader = Encoding.UTF8.GetBytes(rpcCallString);

	requestStream.Write(rpcHeader, 0, rpcHeader.Length);
	requestStream.Close();

	fullFilePath = Path.Combine(tempFilePath.FullName, fileName);

	using (Stream strOut = File.OpenWrite(fullFilePath))
	{
		using (var sr = wReq.GetResponse().GetResponseStream())
		{
			byte[] buffer = new byte[16 * 1024];
			int read;
			bool isHtmlRemoved = false;
			while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
			{
				if(!isHtmlRemoved)
				{
					string result = Encoding.UTF8.GetString(buffer);
					int startPos =result.IndexOf("</html>");
					if(startPos >-1)
					{
						//get the length of the text, '</html>' as well
						startPos += 8;
						
						strOut.Write(buffer, startPos, read - startPos);

						isHtmlRemoved = true;
					}                                    
				}
				else
				{
					strOut.Write(buffer, 0, read);
				}
			}
		}
	}
}

Evaluation

Here I am using the method, “get document” and, “15.0.0.4455” is the server extension number.
Service name is server relative URL of your site.
documentName is the serverRelativeUrl (FileRef) of the file to be downloaded, minus the webServerRelativeUrl.
For authentication, I am using the CookieContainer of HTTPWebRequest.

RPC

In case you’re not familiar with this format. RPC not only returns the actual file but it also prefix the file content with html. So, in order to get the actual file, we need to remove this html from the response. Which is exactly why, I am getting the index of ‘</html>‘ and respectively setting the value of startPos (starting position for file writing).

Following is the sample of the html sliced out from the download of a file, 90 MB.docx. As you will see, it just contains the file meta info.


<html><head><title>vermeer RPC packet</title></head>
<body>


method=get document:15.0.0.4420


message=successfully retrieved document 'Doc lib/90 MB.docx' from 'Doc lib/90 MB.docx'


document=

<ul>

<li>document_name=Doc lib/90 MB.docx

<li>meta_info=

<ul>

<li>display_urn:schemas-microsoft-com:office:office#Editor

<li>SW|Piyush Singh

<li>vti_rtag

<li>SW|rt:A90CEB13-B279-480F-B07C-244670076247@00000000006

<li>vti_etag

<li>SW|"{A90CEB13-B279-480F-B07C-244670076247},6"

<li>vti_parserversion

<li>SR|16.0.0.5312

<li>vti_folderitemcount

<li>IR|0

<li>vti_timecreated

<li>TR|10 Jul 2014 20:55:16 -0000

<li>vti_sourcecontrolcheckincomment

<li>SR|File Restoration on Thursday, June 2, 2016

<li>vti_streamhash

<li>SR|0x02C8D921A84FE3E82F3C2A5866DA589513DA11314C

<li>vti_canmaybeedit

<li>BX|true

<li>vti_author

<li>SR|i:0#.f|membership|piyush@something.onmicrosoft.com

<li>vti_timelastwritten

<li>TR|02 Jun 2016 12:42:39 -0000

<li>vti_level

<li>IR|1

<li>vti_modifiedby

<li>SR|i:0#.f|membership|piyush.singh@something.onmicrosoft.com

<li>display_urn:schemas-microsoft-com:office:office#Author

<li>SW|Piyush Singh

<li>source_item_id_Col

<li>SW|10__1405054516000

<li>vti_foldersubfolderitemcount

<li>IR|0

<li>vti_filesize

<li>IR|94437376

<li>ContentTypeId

<li>SW|0x010100623C9C49E42A00419C619EE6EAF8D8C1

<li>vti_timelastmodified

<li>TR|10 Jul 2014 20:55:16 -0000

<li>vti_nexttolasttimemodified

<li>TR|02 Jun 2016 12:42:42 -0000

<li>vti_candeleteversion

<li>BR|true

<li>vti_sourcecontrolversion

<li>SR|V5.0

<li>vti_sourcecontrolcookie

<li>SR|fp_internal
</ul>

</ul>

</body>
</html>

Reference

Finally, this post , from Steve Curran, has really helped me in clearing my doubts regarding RPC.

I have tested the above code and it has worked perfectly well for the download of files up to 9.7GB.

23 thoughts on “Download Large Files from SharePoint Online”

Value was either too large or too small for an Int32 | SharePoint Blog - Piyush says:

March 19, 2017 at 14:26

[…] one of my previous post I had demonstrated how to, Download Large Files from SharePoint Online. While the file download is working perfectly fine, but, I was constantly getting the following […]

LikeLike

Reply
Dharmesh Fichadiya says:

August 31, 2017 at 17:24

Hi Piyush,

Thanks for posting this article to download large file from using RPC.
I have simply pasted this code in POC and tried to download 2 GB file from SPO.

After 1.7 GB, i got an error “Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host”. Do you have any idea in which case it can throw this error? I am using trial tenant.

Also how much time it takes usually to download 2 GB contents?
if i keep chunk size 50 MB then will it downloadin 50 MB chunks?

I actually want to migrate large files from SPO to SPO in chunks.
I used File.OpenBinaryDirect and when i used seek for that it was throwing error “Stream does not support seek operation”.

So i decided to download large file locally using above code migrate date in chunks from locally save file.

LikeLike

Reply
- Piyush says:
  
  September 1, 2017 at 13:22
  
  This sort of error occurs due to a drop in internet. Retrying the same will help.
  
  LikeLike
  
  Reply
Dharmesh Fichadiya says:

August 31, 2017 at 23:05

Can you please let me know what changes are required to get version contents?

LikeLike

Reply
- Piyush says:
  
  September 1, 2017 at 13:24
  
  Can you please elaborate on this? Do you want to download the contents of each version of the file?
  
  LikeLike
  
  Reply
Dharmesh Fichadiya says:

August 31, 2017 at 23:06

It worked for 2 GB. Checking with 9.5 GB. Thanks for article Piyush!!

LikeLike

Reply
- Piyush says:
  
  September 1, 2017 at 13:28
  
  Hi Dharmesh,
  
  Happy to help! 🙂
  
  And, thanks for sharing your result with a 2GB file. I am assuming that you were also able to successfully download the 9.5GB file as well!
  
  LikeLike
  
  Reply
  - Dharmesh Fichadiya says:
    
    September 1, 2017 at 20:23
    
    Hi Piyush,
    
    Thanks a lot for your reply.
    Yes I was able to download 9.5 GB file as well.
    Yes I want to download contents of file for each file version. Is it possible? If yes can you please share piece of code which i can use to download file version content?
    
    Please find my actual requirement below.
    I want to migrate large file (with-without versions) from SPO to SPO. So either i can
    
    1) Download large file to local file system and from that upload in chunks
    2) Get data in chunks using above code and upload in chunks
    
    With above approach I was able to download 9.5 GB file. Not checked more than that. Is there any limitation? Can I download up to 14 GB file as well because in SPO I can upload up to 15 GB.
    
    Also can i retrieve file contents in chunks using RPC? do you have any idea on this?
    
    LikeLike
Dharmesh Fichadiya says:

September 4, 2017 at 16:07

Hi Piyush,

I am able to retrieve 14.50 GB contents as well using above approach. Also i am able to retrieve contents for each file version as well.

Do you have any idea whether we can retrieve file contents using RPC using in chunks?

I mean retrieve 2MB data using above code and write to file system
from next location, retrieve 2MB and write to file system and so on. i.e. return byte[] everytime. is it possible?

LikeLike

Reply
- Piyush says:
  
  September 4, 2017 at 20:58
  
  Glad to know that you were able to download file up to 15GB. I guess for version, there was an option to provide docVersion in the above code which, I kept, purposefully blank in order to automatically download the latest file.
  
  As for your other query, you see, we are already downloading the data in parts. GetResponseStream does not download the entire data but it only opens up a Stream object, https://stackoverflow.com/a/21281089. You can also verify this by using a breakpoint on this line. You see even for a large file it will pass very quickly. The real download will start inside the while loop. Once the execution is in there, check on the path where the file is being written. File size will increase gradually as the chunks are downloaded and appended to it.
  
  LikeLike
  
  Reply
Dharmesh Fichadiya says:

September 15, 2017 at 13:41

Hi Piyush,

Thanks for your response. This article really helped me.

LikeLike

Reply
- Piyush says:
  
  September 18, 2017 at 20:22
  
  🙂
  
  LikeLike
  
  Reply
Srinivasa Rao S says:

October 1, 2017 at 22:48

Hi Piyush,

Is there any way to use RPC with OAuth access token instead of Username and Password ?

–Srinu

LikeLike

Reply
- Piyush says:
  
  October 14, 2017 at 23:20
  
  No. Access Tokens work with SharePoint REST APIs only. For RPCs, just like SharePoint Web Services, SharePoint Claims Authentication is required. Which implies getting the authentication cookie first, using the username and password, and then, using the same in the actual RPC call.
  
  LikeLike
  
  Reply
jacky says:

October 12, 2017 at 08:01

Hi Piyush,
I want to download a video file such as mp4, How do I change the method parameters?
thanks

LikeLike

Reply
- Dharmesh Fichadiya says:
  
  December 12, 2017 at 15:23
  
  Hi Jacky,
  
  I am able to download mp4 file using same mechanism…
  let here know in case you are facing issues. I will share code.
  
  LikeLike
  
  Reply
Dharmesh Fichadiya says:

December 12, 2017 at 15:27

Hello Piyush,

While downloading files, sometimes i am getting timeout error from HttpWebRequest.GetResponse()
Type: System.Net.WebException

By setting below properties, I am not getting timeout exception
httpWebRequest.ReadWriteTimeout = 400000;
httpWebRequest.Timeout = 200000;

Note : This is occurring sometimes only with small files e.g. 5-10 MB only…
So should i set ReadWriteTimeout and Timeout properties as well?

LikeLike

Reply
Frank says:

January 23, 2018 at 15:17

Hello Piyush: I was trying to use it in C# windows application. But unable to resolve the class name ‘Utility’. So instead of ‘Utility.GetEncodedString’ I am using ‘HttpUtility.HtmlEncode’. The code executed without any error but return only file metadata as vermeer RPC packet

method=get document:15.0.0.4455

status=

status=589830 osstatus=0 msg=There is no file with URL ‘mysp.sharepoint.com/sites/communicationsite1/Doc1.docx’; in this Web. osmsg=

. What I am doing wrong? If required I can share the sample code with you.

LikeLike

Reply
Michael Maillot says:

December 17, 2019 at 14:12

Hi ! Great article thank you !

I tried this approach and the download worked. But in my case, it was a huge PowerPoint File (~2Gb, no joking…), so when I tried to open it locally, PowerPoint told me that it was broken and asked me to repair the file to open it.

Anyone here met the same problem when downloading a huge Office file ? How to solve that “broken / repair” problem ?

Thx.

LikeLike

Reply
SUNIL TIWARI says:

June 25, 2020 at 18:59

Hi Piyush,
Nice article and seems to working for me but I am unable to locate the namespace for Utility class. Could you please help with the same.
Regards,
Sunil

LikeLike

Reply
Kushal Trivedi says:

June 27, 2020 at 13:45

Hi Piyush!

I tried this code but it seems that I cannot retrieve SharePointOnlineCredentials.GetAuthenticationCookie with the help of ClientContext (generated via TokenHelper) as ClientContext.Credentials is set as null.

LikeLike

Reply
Yury Euceda says:

December 6, 2020 at 05:07

Hi Piyush!

It was very very helpful your post thank you a lot.

I applied this method to dowload multiple files in parallel but it only let to download 2 files at same time.
If you login many times you can dowload many files in parallel but login is time consuming, and I neet to know if there is a way to authenticate permanent and download multiple files in parellel. Or is there a limit in number of cookies generated?

LikeLike

Reply
KpR says:

January 10, 2023 at 00:10

Hi, thanks for this post.
Any method to pass an OAuth token ? With modern authentication, it looks like this code is not working anymore.

Thanks

LikeLike

Reply

	braynyak on Update the child flow for acti…
	Mark Dininio on Update the child flow for acti…
	elper on Temporarily disable Power Auto…
	Rania on Export SharePoint MetaData to…
	Vin on Power Automate: Read Excel…

Piyush K Singh

unfolding the journey

Download Large Files from SharePoint Online

Correct Approach

Evaluation

RPC

Reference

23 thoughts on “Download Large Files from SharePoint Online”

Leave a comment Cancel reply

Correct Approach

Evaluation

RPC

Reference

Share:

Related

23 thoughts on “Download Large Files from SharePoint Online”

Leave a comment Cancel reply