Hacker Newsnew | past | comments | ask | show | jobs | submit | Acmeon's commentslogin

In principle, I find it valuable to integrate tools. However, in this case I would be somewhat cautious, especially as "your chats, attachments, and workbook content — may be shared with OpenAI" (as per the Microsoft Marketplace description: https://marketplace.microsoft.com/en-us/product/WA200010215?...).

This seems like a security nightmare, which is especially relevant because sensitive data is often stored in Excel files.


Hi, engineer on this add-in. Fair concern but we never train on any of our business or enterprise user data, or if you have opted-out of training on your ChatGPT account.

Forgive my ignorance. How do you folks manage context retention? Say if someone had a sensitive excel document they wanted inference done over, how is that data actually sent to the model and then stored or deleted?

It seems one of the biggest barriers to people's adoption is concern over data leaving their ecosystem and then not being protected or being retained in some way.

Is this is an SLA that a small or medium sized company could get?


If you're concerned, you don't send it outside of the M365 boundary and presumably your admin has Purview Sensitivity Labels in place covering the document to prevent such activity.

Doesn't that mean you can't actually use it for those sensitive documents?

Correct.

{EDIT} English and or the concept of written word may be foreign to you. Thank you for your assistance.

Not sure why you'd state that. 'Correct' is a grammatically correct and complete sentence to your question.

Yeah, I was expecting that you do not train on business or enterprise user data. However, I am not just worried about "training", but also about "sharing". Furthermore, I am worried about cases where an individual has chosen to integrate an add-in and then inadvertently leaks sensitive data.

However, it may be important to note that these security considerations are relevant for most Office Add-Ins (and not just the ChatGPT add-in).


That's the nature of these add-ins. Modern Add-ins are all little XML frames with some JS or whatever. All processing occurs server-side, hosted by the add-in publisher.

This is counter to the old (security nightmare) COM model where processing could be local.


To clarify: add-ins are essentially web pages. They can do some processing client side if they want, but yeah in the case of a ChatGPT add-in it's not like they're running the model in a web frame.

Exactly. My company's add-in is a React app, which can be fun to debug with the limited developer tools in Word/Excel's browser.

Do you mean that you worked on the Excel Add-Ins platform in Excel (and not on a specific Add-In)?

If you were working on the platform itself, then I would be interested in hearing your more detailed thoughts on the matters you mentioned (especially since I am developing an open source Excel Add-In Webcellar (https://github.com/Acmeon/Webcellar)).

What do you mean with a "OM" call? And why are they especially costly on Excel web (currently my add-in is only developed for desktop Excel, but I might consider adding support for Excel web in the future)?

In any case, `context.sync()` is much better than `context.executeAsync()`.


I worked on the Excel Add-Ins platform at Microsoft, yes. By OM call I mean "Object Model" call, basically interacting with the Excel document.

The reason those calls are expensive on Excel Web is that you're running your add-in in the browser, so every `.sync()` call has to go all the way to the server and back in order to see any changes. If you're doing those calls in a loop, you're looking at 500ms to 2-3s latency for every call (that was back then, it might be better now). On the desktop app it's not as bad since the add-in and the Excel process are on the same machine so what you're paying is mostly serialization costs.

Happy to answer more questions, though I left MSFT in 2017 so some things might have changed since.


Does Excel for Web still spin up an actual copy of Excel.exe on a machine somewhere? I heard that is how the initial version worked.

No, as the other comment mentioned. But I’ve heard of more than a few customers running their own “server excel workflow” where they have an instances of excel.exe running a VBA macro that talks to a web server (and does some processing).

Never did this. WAC was the original version (integrated with SharePoint Server). Everything was server-side.

While working at MS I remember someone in the office team saying that the original version of Excel online spun to the actual Excel backend and had it output HTML instead of the usual win32 UI. Was I misinformed by chance?

Excel Online was a component of WAC. It was an ASP.NET (and C++???) web application that used OAuth between SharePoint Server and Exchange Server.

So I mean yes, you viewed Excel docs through a webpage just like you do today via ODSP or OneDrive consumer. The backend is completely different in the cloud service, though.


> WAC

Now that’s an acronym that I had forgotten about.


Yeah, that makes sense. For some reason, I was under the impression that all calculations run locally in the browser, which would have been comparable to how Excel desktop works (i.e., local calculations). Is there a reason for why the Excel calculations run on the server (e.g., excessive workload of a browser implementation, proprietary code, difficult to implement in JavaScript, cross browser compatibility issues, etc.)? Furthermore, if the reason for this architecture is (or was) limitations in JavaScript or browsers, do you find it plausible that the Excel calculations will some day be implemented in Webassembly?

Regardless, I have always preferred Excel desktop over Excel web (and other web based spreadsheet alternatives). This information makes me somewhat less interested in Excel web. Nonetheless, I find Excel Add-Ins useful, primarily because they bring the capabilities of JavaScript to Excel.


I don’t think Excel web will ever be running the calc engine browser side, no. The only way I could see this happen would be via compiling the core to wasm, which I don’t think is worth the engineering effort.

Excel has this legacy (but extremely powerful) core with very few people left that knows all of it. It has legacy bugs preserved for compatibility reasons as whole businesses are ran on spreadsheet that break if the bug is fixed (I’m not exaggerating). The view code for xldesktop is not layered particularly well either leading to a lot of dependencies on Win32 in xlshared (at least back then).

Is it doable? I’m sure. But the benefits are probably not worth the cost.


> Excel has this legacy (but extremely powerful) core with very few people left that knows all of it.

Would love to hear more about this. Especially history and comparison to Lotus etc.


So the first thing that's important to understand is that Excel is the product of another era. One where resources like memory were very constrained and compilers and optimizers weren't as good as they are today (so much so that the Excel team at one point wrote their own because MSVC sucked, but I digress)...

And so, a lot of the core code is used to that. Cell formatting data for example is super tightly packed in deeply nested unions to ensure that as little memory is used to store that info. If something only needs 3 bits, it'll only use 3 bits. The calc engine compiles all of your formulas to its own (IIRC variable-instruction-width) bytecode to make sure that huge spreadsheets can still fit in memory and calc fast.

And a lot of it still carries the same coding and naming practices that it started with in the 80s: Hungarian notation, terse variable and file names, etc. Now, IMO, Hungarian notation by itself is pretty harmless (and even maybe useful in absence of an IDE), but it seemed to encourage programmers to eschew any form of useful information from variable names, requiring you to have more context to understand "why" something is happening. Like, cool, I have a pszxoper now (pointer to zero terminated string of XOper), but why?

So the code is tight, has a lot of optimization baked in and assumes you know a lot about what's happening already.

But more importantly, a lot of "why" information also just lives in people's head. Yes, some teams would have documentation in an ungodly web of OneNote notebooks, or spread across SharePoint pages, which had the least useful search functionality I've ever witnessed, but finding anything you wanted was hard. But that didn't use to matter, since the core team had been there for a long time, so you could ask them question.

That being said, I joined MSFT in 2012 and started working on Excel closer to 2014. At that point, heavyweight like DuaneC (who wrote like 10% of Excel and I don't think I'm exaggerating) had already retired and while others people were very knowledgeable in some areas, nobody seemed to have a good cross view of the whole thing.

You have to understand that I was in the Office Extensibility team. We were building APIs for the whole thing. I had to touch the calc system, the cells and range, the formatting, tables, charts and images (the whole shared oart system was interesting), etc. Answering "How do I do X" was always a quest because you would usually:

- Find 3 different ways of achieving it

- One of them was definitely the wrong way and could make the app crash in some situations (or leak memory)

- All the people on the "blame" had left

- One of them was via the VBA layer which did some weird stuff (good ol' pbobj)

- Be grateful that this wasn't Word because their codebase was much worse

And so, a lot of the API implementation was trial and error and hunting down someone who understood the data structures. The fact that full sync and rebuild took about 6 hours (you ran a command called `ohome` and then you went home) meant that experimenting was sometimes slow (at least incremental builds were OK fast). The only lifeline back then was this tool called ReSearch2 that allowed you to search the codebase efficiently.

But the thing is, once you got thing to work, they worked really well. The core was solid and performant. Just slightly inscrutable at time and not the kind of code you're use to reading outside of Excel.


Surely picking that apart is a good idea. Even if it costs a fortune, there have to be benefits.

Thanks for the interesting info! Yeah, maybe Excel web will someday support local calculations via wasm, but for now I think I will stick with Excel desktop with add-ins.

> though I left MSFT in 2017 so some things might have changed since.

Honestly, I struggle to think about what has actually changed between Office 2013 and Office 2024 (and their Office 365 equivalents); I know the LAMBDA function was a big deal, but they made the UI objectively worse by wasting screen-space with ever-increasingly phatter non-touch UI elements; and the Python announcement was huge... before deflating like a popped party balloon when we learned how horribly compromised it was.

...but other than that, Excel remains exactly as frustrating to use for even simple tasks - like parsing a date string - today just as it was 15 years ago[1].

[1]: https://stackoverflow.com/questions/4896116/parsing-an-iso86...


Looks quite cool! I have also considered similar ideas, but have decided to not develop anything. However, there are a few important areas where we have different points of view. My main use cases relate mainly to computationally demanding applications and, thus, runtime performance is quite important.

First, the claim that one gets "reactivity for free" is not entirely true (although the costs may be negligible for many apps). The stores are wrapped in proxies, which lowers the performance of, for example, property access. This is why I rejected the idea of proxies and instead considered generating getters and setters to handle reactivity. This could, in principle, enable zero overhead reads (assuming that all the reactivity stuff were to be handled in the setter). However, this approach fails to account for, for example, array mutations. Thus, I see the point in proxies, but the cost of them can be significantly lowered performance (which may not matter for all applications).

Second, not memoizing computed values (i.e., getters) can also have a significant negative impact on runtime performance, because expensive computations may have to be executed many times. I suppose that caching could be offloaded to the developer, but this could be laborious, at least if the developer would have to keep track of when to invalidate the computation. In, for example, Excel, computed values can be accessed fast, because they are stored in memory (although the added memory usage can become a problem).

Third, you have not addressed async stores or async computed values (as far as I can tell). I will admit that considering async can be quite complicated. However, for better or worse, async is still a key part of JavaScript (e.g., sync alternatives do not exist for all APIs). Thus, the claims "JavaScript is enough", "Zero Concepts" and "You already know the entire API" are slightly imprecise, because async is not supported (although, this does not necessarily matter for many applications).

These three points were the main reasons why I chose not to pursue my similar ideas (especially in the context of computationally demanding applications). Still, I find the idea that JavaScript should be enough for reactivity to be compelling and worthwhile of further development.


Thanks for your insights. I was originally hesitant about the performance of proxies, too, but they turned out to be great. The benchmarks (https://geajs.com/benchmark-report.html) also show good results. Both in terms of memory and CPU cycles, even though proxies are obviously adding an overhead, it's not day and night (https://jsben.ch/proxy-vs-object-performance-benchmark-dtxo6 is a good test for this). With a proxy, you can set a property 25 million times per second (on my M4 Max machine in Safari) with only 4% perf loss vs defineProperty, and Chrome is about half the perf with 20% loss vs defineProperty. So, still, 12.5 million setters per second is pretty good. Of course if your use case demands more performance, nothing beats a hand-optimized vanilla JS code.

Since Gea doesn't rerender the template _at all (well, for the most part, at least)_, in theory we wouldn't really gain much from getter memoization, mainly because we create proxy observers for computed values that update the DOM in place only when the underlying value updates.

And since stores are plain old JS classes, there's no need for an "async store" concept. Just update a value in the store whenever you want, track its loading state however you want, either synchronously or asynchronously, and the observers will take care of them. If you refer to another pattern that I'm not aware of, please let me know.


On my computer, the linked benchmark gives the following results.

Object.defineProperty getter: 82M ops/sec, Proxy getter 32M ops/sec => proxy get is 2.56x slower.

Object.defineProperty setter: 14M ops/sec, Proxy setter: 12M ops/sec => proxy is 1.17x slower.

However, in my simple self written benchmark that compares the time it takes to sum the property values (i.e., getter) of 100 million proxies vs plain objects, the result is that the proxies are 13x slower.

When benchmarking the setting of the property value of the 100 million proxies vs plain objects, the result is that the proxies are 35x slower.

My simple benchmark gives results that significantly deviate from the linked benchmark. Regardless, the relevance of the performance implications of proxies should be evaluated on a case by case basis.

Regarding the memoization, I was primarily referring to accessing the getter multiple times (i.e., not necessarily in DOM rendering), which can cause unnecessary computation in Gea (as far as I can tell). In my envisaged use cases, this could often lead to problems (with, e.g., large data sets).

My issues with async mainly relates to developer convenience (i.e., managing the async stuff can be cumbersome). For example, one can use await in a top-level module statement, but not in a class getter. There has been some relevant discussions about this in the context of Svelte, see, e.g., https://github.com/sveltejs/svelte/discussions/15845, https://www.youtube.com/watch?v=1dATE70wlHc and https://www.youtube.com/watch?v=e-1pVKUWlOQ.

Consider this conceptual example (i.e., async computed value):

  store0.url = "...";
  store1.res = await fetch(store0.url);
How would this be accomplished in Gea so that reactivity would be preserved (i.e., mutating store0.url would trigger a new fetch)? Is it possible to "listen" to changes to store0.url and handle the async code?


Very interesting benchmark results... well, I guess that's proxies for you.

Getters in Gea are designed to be used as computed properties by the components' rendering cycles, and in that, they don't need explicit memoization. I believe users can implement their own memoized computed variables if they require repetitive access—in which case each store's `observe` method could be utilized in updating the memoized value whenever a specific value changes in the store.

And for the async case, for now the compiler doesn't handle this scenario. It could be added, but as you expect, for now the same `observe` method could help here as well to listen to changes on store0.url and refetch accordingly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: