Julia language: “Pkg.jl telemetry should be opt-in”

栏目: IT技术 · 发布时间: 4年前

内容简介:Note that the uuid you are talking about isn’t like a Google advertising ID. It isn’t linked with any personal identifying information (other than is a Julia user with … Julia packages installed). There isn’t a conceivable way for this number to tell anyon

#1

New “ Pkg & Storage protocols ” and an accompanying centralized service to host packages have been merged and are present in Julia v1.5.0-rc1. The new Pkg sends telemetry consisting of a user-specific UUID and other information to the server, where it is used to count the number of users and other stats. The goal is to answer the question “How many Julia users are there?” in fundraising. The current protocol is opt-out, meaning that these stats are collected unless a user changes a configuration file.

I would like to see one of two changes made:

  • Make the Pkg.jl telemetry opt-in by default for the Julia binaries. In Feb it was implied on Github that the opt-in nature of v1.4 would remain in v1.5. I do not think it is appropriate for the Julia open-source project to be collecting a user identifier along with info on that user’s packages. I believe that even this minimal data is a “toxic asset” and is more appropriate for a for-profit product such as JuliaPro. It feels odd that while Apple is taking steps to prevent user tracking, Julia is adopting it. The HyperLogLog technique seems more reasonable for opt-out tracking.

  • If we are to keep the opt-out behavior in v1.5, I would like to remove “anonymous” from the Pkg.jl warning or at least change it to “pseudonymous”. UUIDs (like a browser cookie) are only anonymous until they are not (say via a data leak or correlation with information from another source as done by browser trackers).

Stefan said (in a conversation beyond the Slack horizon) that knowing how many Julia users there are would aid in fundraising. I understand the attraction of knowing this marketing number, even so, is the Julia project so strapped for cash that they need to monetize Julia users?

I enjoy Julia and the community very much; for example I’m grateful for the diversity and inclusion efforts. The Pkg.jl opt-out telemetry is the first thing in the Julia code or community that I have found distasteful. I hope you’ll forgive me for sharing this on discourse; I believe others may be interested.

16 Likes

#2

is the Julia project so strapped for cash that they need to monetize Julia users?

The first question JuMP gets asked when applying for funding/awards/etc is how many users we have. Here is a verbatim quote from an email I received today, which asked for “substantiated estimates of the number of installations of a software package.” Should I reply “No idea. Somewhere between 1k and 100k.”?

Moreover, at present, we have no idea how many people use each solver (and on which platform!). Knowing how many people installed which solver would allow us to prioritize support from our finite developer time.

This would also allow us to lobby the commercial solver developers to provide official support (or $$). To quote one company “We’ll want to provide official support at some point, but it looks like the scales haven’t tilted quite yet.” It’d be nice to know whether 100, 1000, 10000, or 100000 people per month use their software; that might change their mind.

Finally, if it is opt-in, the vast majority of users will not opt-in. This leaves us no better off than we were before. Opt-out is a good compromise.

To summarize, at the cost of sending pseudonymous UUIDs (which you can opt-out of), we get easier access to sustained funding for Julia ecosystem development and more efficient usage of developer time. That seems like a good trade-off to me.

21 Likes

#3

In order for a trade-off to (possibly) be good, users need to know that they are making a trade-off in the first place.

How are users going to be informed of the trade-off and the mechanisms to opt-out?

I ask because I follow the Julia issue tracker and read discourse almost every day, and this is the first time about I hear about telemetry in Pkg.

An additional thought: how are you going to use install numbers to estimate usage numbers?

4 Likes

#4

Note that the uuid you are talking about isn’t like a Google advertising ID. It isn’t linked with any personal identifying information (other than is a Julia user with … Julia packages installed). There isn’t a conceivable way for this number to tell anyone anything about you that they couldn’t find out more easily another way.

6 Likes

#5

I understand that, and I trust Julia’s developers. The reason for my comment is my concern for openness: this is something that shouldn’t be done without telling users about it, on principle.

I just found https://github.com/JuliaLang/Pkg.jl/pull/1544#issuecomment-565160856 , where Stefan says this is going to be documented. It’s not there yet, but I trust it will be before 1.5.0 is released.

#6

while I think it should be documented and/or prompted first time user using it, it’s not over-reaching at all; just think about how this very forum probably logs your IP and pages visited – it’s even less aggressive than cookies! This is not again open source (FOSS) philosophy at all.

Especially considering pkg usage cannot be connected to virtually anything about one’s identity.

2 Likes

#7

How are users going to be informed of the trade-off and the mechanisms to opt-out?

From https://julialang.org/legal/data/#opting_out: “The first time you connect to a new server, Julia will print a brief legal notice with a link to this page.”

An additional thought: how are you going to use install numbers to estimate usage numbers?

Installs ~ users. You could also look at the packages updated within the last 30 days. It’s always going to be an approximate metric. The goal is to have something that is better than nothing (what we have now).

5 Likes

#8

I don’t dispute the utility of knowing how many users Julia or JuMP has. It may also be useful to know how often JuMP is used or included! Why not have Julia count how often each method is used and send a daily report to the package server? (Edit: to be clear, I’m not in favor of this Julia language: “Pkg.jl telemetry should be opt-in” )

What do you think ofHyperLogLog tracking, which doesn’t require sending a UUID?

No one is claiming that there is any personally identifying information tied to the UUID at present; I’m sure that all the people doing in-browser tracking claimed the same to start with. Even so, what happens when some other package decides to require an email for use, and ties that to the UUID. Or there is a data breach?

#9

This forum is definitely opt-in , which is what I’m suggesting Julia language: “Pkg.jl telemetry should be opt-in”

#10

if you need to find answers or view docs, then its is not, unless someone only use the docs come with the source code but even then github tracks the IP and query for git clone / download too am sure.

My point is, I support the ideal / spirit but not all telemetry are the same and this one is probably fine (truly anonymous)

#11

Github (Microsoft) is for-profit, Julia is not. Can you point to another open-source, non-profit project that tracks users? For example, I do not believe that Python or R track users.

#12

Let me emphasize my point: personally, I’m not against the proposed Julia package telemetry. However, I expect an open project to let users know that it is happening, and provide instructions to opt-in or out.

Also, as I said above, Stefan already said this will be documented, and it looks like opting out is painless.

3 Likes

#14

It looks like there is functionality for this recently merged to Pkg: https://github.com/JuliaLang/Pkg.jl/pull/1871 .

I didn’t see the notice when trying out the 1.5 release candidate, although Pkg.PlatformEngines.telemetry_notice() prints the notice. I saw in the PR it mentions it prints the notice the first time you talk to the pkg server; I think I probably have already talked to the pkg server via the 1.5 beta and maybe also by opting in on 1.4 (I forget what I’ve done on this computer), so that might be why it didn’t print for me on the 1.5 rc.

2 Likes

#15

looks like if you remove ~/.julia/servers/pkg.julialang.org/telemetry.toml it would print (didn’t try, guessing from the code)

2 Likes

#16

With the difference that you don’t have any much control over what cookies do, instead you can

Pkg

I don’t see any benefit in profiling single users in Julia (unless the evil plan is to do targeted advertising of packages in the REPL!), only getting aggregate usage statistics, which is my understanding will be shared with the public. If there was any evidence of an evil plan looming, I’m sure many people, likely including me, would either stop using Julia (not a great outcome for anybody) or hack Pkg to stop doing something evil.

4 Likes

#17

To my knowledge, this is not true: the CRAN servers historically produced and maintained traditional de-anonymizable server logs that included information like IP addresses, but the maintainers were historically unwilling to share those logs with anyone outside the core team. I’ve been on direct e-mail threads with the CRAN maintainers where they’ve declined to share that kind of data, but acknowledged its existence.

Things may have changed since then, but I don’t think you’re representing the broad state of the art accurately. AFAICT Julia differs from Python and R primarily because Julia uses GitHub servers for hosting most artifacts, so GitHub has all the of the truly private information for Julia users, but Python and R have that data for their communities because they host most artifacts directly.

6 Likes

#18

Also Firefox is an example of an open source project tracks at least as much data.

#19

Yeah, and they disclose what they do: https://www.mozilla.org/en-US/privacy/faq/

But that’s besides the point, in my opinion; what other projects do or don’t do doesn’t change my (personal) belief that a project like Julia should be up front about their user telemetry. There is even a perfectly legitimate reason to do it! How many projects have that luxury?

I mean, I get that some people may believe undisclosed telemetry is OK, or just plain don’t care, but pointing to what other projects do as justification is unconvincing. Two wrongs don’t make a right.

4 Likes

#20

Thank you for sharing this. So my understanding is that Julia is proposing sharing with Julia package owners the sort of information that the CRAN maintainers are not willing to share with R package owners; is that right? Does the CRAN “core team” include R package maintainers?

It looks like no one but me opposes UUID tracking. There’s no need to change things just for me. I hope you all will forgive me for sharing a few thoughts on it.

2 Likes


以上所述就是小编给大家介绍的《Julia language: “Pkg.jl telemetry should be opt-in”》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Using Google App Engine

Using Google App Engine

Charles Severance / O'Reilly Media / 2009-5-23 / USD 29.99

With this book, you can build exciting, scalable web applications quickly and confidently, using Google App Engine - even if you have little or no experience in programming or web development. App Eng......一起来看看 《Using Google App Engine》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

在线进制转换器
在线进制转换器

各进制数互转换器

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具